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A-B VARIABLE IN PSYCHOTHERAPY: 
A CRITICAL REVIEW 


ANDREW M. RAZIN1 


Teachers College, Columbia University 


Research on the A-B classification of therapists is reviewed under differential 


clinical outcome, 


personal characteristics, and analogue studies. Outcome work 
found As more effective with schizophrenics than Bs, whether 


A-B status was 


defined by therapy behavior or vocational interests. Later work found Bs more 
effective with neurotics. Most recent data point to the importance of improve- 


ment criteria (As seem more successful in effecting 
in effecting impulse control). Personal characteristic studies find As 
trusting, intropunitive, and tolerant of "inner" 


subjective relief, and Bs, 
more 
experiences and spontaneity ; 


more field dependent and personally involved with patients; and more oriented 


to problem solving than Bs. 


analogues have yielded conflicting data, but they 


dyadic contact, “complementa 
effective than “similar” 


pairings 


Largely because of methodological 


problems, 
do suggest that, in extended 


' pairings (neurotics with schizoids) are more 
(neurotics 


with neurotics, schizoids with 


schizoids). It is concluded that As share with and communicate to schizo- 


phrenics awareness of idiosyncratic perceptions 


and that As are able to persuade 


schizophrenics to trust them. In fact, persuasiveness may link As? occupational 
as a therapy variable. 


interests and warrants extensive study 


Does psychotherapy work? has come to be 
seen as almost a meaningless question in light 
of its failure to specify what is being exam- 
ined (type of therapy, type of therapist, or 
type of patient) in relation to what effects 
(type of outcome). Recognizing this, re- 
searchers have shifted to formulating nar- 
rower, more specified questions (Strupp & 
Bergin, 1969). One productive means of 
clarifying which specific influences have what 
specific effects has been through study of the 
therapist’s personality and behavior. One of 
the more intriguing lines of study to develop 
within this area has been the A-B variable 
as a set of therapist characteristics, 

In the 15 years since Whitehorn and Betz 
(1954) first distinguished A and B therapists, 
evidence has accumulated indicating that the 
A-B variable may be strongly related to 
therapist characteristics, style of therapeutic 


1 The author is deeply indebted to Dr. Allen Bergin 
for his invaluable help and advice. 

Requests for reprints should be sent to the author, 
542 West 112th Street, New York, N. Y. 10025. 
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interaction, and therapy outcome. The review 
that follows critically examines this research, 
considering three subareas: (a) differential 
Clinical outcome (the differential effects, in 
actual psychotherapy, of A and B therapists, 
and interaction of these with patient and 
treatment variables); (b) personal character- 
istics (the relation of the A-B variable to 
other therapist "personality" variables); and 
(c) analogue studies? (data from quasi- 
therapy and nontherapy A-B studies). 


? By coincidence, this review and that by George 
Chartier were both submitted to Psychological. Bul- 
letin at about the same time. To eliminate overlap 
between the two original papers, a cooperative effort 
has been made, such that this review concentrates on 
actual clinical studies and on personal characteristics, 
while the Chartier paper is more concerned with 
analogue studies, with neither review intended to 
be comprehensive. Both authors felt that, although 
the two papers differ somewhat in orientation and 
reach different conclusions, they each had contribu- 
tions to make to topics emphasized in the other's 
final paper. Thus, they have drawn on each other's 
original work to some extent. 


© 1970 by the American Psychological Association, Inc. 
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DIFFERENTIAL CLINICAL OUTCOME 


All of the original and some of the later 
work in this area was done by Whitehorn and 
Betz, who introduced the variable in 1954. 
Working at the Phipps Clinic at Johns Hop- 
kins Hospital, they retrospectively examined 
the attributes of 14 of the 35 psychiatric resi- 
dents who worked at Phipps between 1944 
and 1954 and who met the criterion of having 
treated at least four neurotics, four depres- 
sives, and four schizophrenics. The seven 
therapists with the highest "improvement" 
rates (upper 20%) in treating schizophrenics, 
and the seven with the lowest rates (lower 
20%) were designated “As” and “Bs,” and 
averaged 75% and 27% improvement rates, 
respectively. The 100 patients (mostly middle- 
and upper-class schizophrenics) seen by these 
therapists were hospitalized. Improvement 
data came from analyses of nurses’ notes, 
ward charts, conference notes, and discharge 
status reports. Specifically, | improvement 
status was determined by therapists! ap- 
praisal, by the psychiatrist-in-chief, by the 
senior resident psychiatrist, and by four 
“objective” criteria (disposition at discharge, 
increased participation in social relationships 
with other patients, increased participation 
in clinic activity programs, and changes in 
behavior-chart ratings) all rated indepen- 
dently of the therapist. Despite this compre- 
hensiveness in rating, there were no reliability 
checks mentioned. Ratings of symptom de- 
crease, increase in social effectiveness, and 
increase in insight made possible ratings of 
quality of improvement, in addition to the 
improved versus unimproved ratings. Exami- 
nation of additional data on patients and 
therapists discounted the possibilities that 
(a) As had “easier” patients, or (b) that 
As were much more effective therapists in 
general (i.e., with all types of patients). Fur- 
thermore, the assignment of patients on a 
random, rotating basis eliminated the pos- 
sibility of covert selection of to-be-improved 
patients. 

Examining categories of therapeutic ap- 
proach, Whitehorn and Betz found that As 
and Bs differed in their treatment of pa- 
tients (at the .001 level) in several respects, 


four of which were significantly related to 
improvement. 

1. The type of relationship: As were better 
able to gain the confidence of their patients; 

2. The type of personal formulation of the 
patient’s problems: As tended to understand / 
the motivation and meaning of patients’ be- 
havior, while Bs tended to formulate their 
understanding solely in narrative, biographical 
terms: 

3. Types of strategic goals selected: AS 
tended to formulate as goals (a) the patient’s 
developing a better understanding of his 
capabilities and potentialities for constructive 
conflict resolution; (5) the development of 
a dependable, meaningful relationship be- 
tween the patient and therapist that would 
serve to foster a new start toward greater 
confidence and growth in maturity and pe” 
sonality. Whitehorn and Betz (1954) calle 
these “personality-oriented” goals and con- 
trasted them with the “psychopathology” 
oriented” goals that characterized Bs: super- 
vised living, symptom decrease, “gocializa- 
tion” increase, and “insight” into pathology’ 
The quotes are used to indicate the vaguenes? 
with which therapists used these terms; 

4. The type of tactical pattern used: A 
were actively, personally involved. They were 
characterized by initiative in sympatheti¢ d 
quiry, honest disagreement, challenging - 
self-depreciation in patients, setting of TC? 
istic limits on patients’ behavior; while Bs 
adopted passive, interpretive and/or instruc 
tional, or practical care patterns. 

Categorization of data among these four 
areas was done by examining the patients 
individual case records. A reliability Che ^ , 
yielded an average interrater agreement ° 
88%. In addition to this fairly high fig" 
the stringency and independence of the im- 
provement criteria (see preceding discussio? 
are also noteworthy. It should be made clea" ) 


iras 


however, that the categories were not derive’ 

a priori, but after careful examination ? j 
case records. Far from being “blind,” there 
fore, the authors categorized data into a 
scheme which they constructed out of these 
data. To the extent that the study was sug 
gestive rather than validative, though, this 
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methodology is and 
necessary. 

In 1956 Betz and Whitehorn reported on a 
study in progress in which insulin treatment 
entered as an additional variable (no patients 
in the 1954 study had received insulin or 
electroconvulsive shock treatment). In this 
Study, 45 patients received both insulin and 
psychotherapy, and 64 received psycho- 
therapy only. In a methodological attempt to 
bolster the 1954 findings, the authors studied 
all 18 therapists using 70% improvement rate 
as the A-B dividing line. (By dichotomizing 
all therapists instead of using upper and 
lower fifths, they made it more difficult for 
A-B differences to appear.) The patients were 
the 109 schizophrenics treated between 1950 
and 1954; there was no overlap between this 
and the 1954 patient sample. The authors, 
however, made no mention of the extent of 
therapist overlap in the two studies. In the 
"psychotherapy only" condition, As showed 
an 82% improvement rate, and Bs, a 34% 
rate. Again, the four types of therapist ap- 
proach to treatment identified in the 1954 
study predicted outcome significantly. Insulin 
treatment left the A improvement rate un- 
affected at 82%, but B rates jumped to 
82%. The authors noticed, though, that in 
this (B therapist, insulin plus psychotherapy) 
condition, patients received active, personal 
participation from therapists. Somehow, insu- 
lin treatment made B therapists more like As. 
This observation, however, was clouded by 
the fact that 85% of the patients getting 
active personal treatment were of upper or 
middle socioeconomic status, and that 59% 
of patients getting this treatment had special 
abilities or interests. This suggested to the 
authors that unless a patient's talents or 
resources were conspicuous or like his thera- 
tist’s, other talents might go overlooked and 
unactivated therapeutically. They hypothe- 
sized that As are less status-bound and more 
sensitive to a wider range of resources: they 
are therefore more able to activate the schizo- 
phrenic’s strengths. Whitehorn and Betz 
stressed the importance of active, personal 
participation, noting that of the 71 patients 
in both (Betz & Whitehorn, 1956; Whitehorn 
& Betz, 1954) studies receiving such treat- 


permissible perhaps 


ment, 89% improved. They (Whitehorn & 
Betz, 1957) added regretfully, that only 71 
(30% of the sample) received such treatment. 
The 1957 study (a complete report of the 
1956 “insulin” data) also revealed insulin to 
have no effect on quality of improvement (As 
still showed better quality). 

In 1960, Whitehorn and Betz presented 
follow-up data on the 209 patients from the 
preceding studies. All had been discharged for 
at least 5 years: of those rated improved, 
70° remained so. Whitehorn separately 
(1960) reported these follow-up data in more 
detail. A and B therapists had 80% and 31% 
improvement rates at discharge; at follow-up, 
at least 5 years later, these figures were 
77% and 65%. This points to a relative per- 
manence on the part of As’ effectiveness, and 
a failure (or delay) on the part of Bs to 
bring about improvement that later occurred 
in any of several other ways. Betz (1963a) 
reported further follow-ups of at least 5 years 
on 155 of the original 209 patients reported 
in the 1954 and 1957 studies. Of those rated 
improved at discharge, 60% had no further 
hospitalization (40% did); of those rated 
unimproved at discharge, 15% had no further 
hospitalization (85% did). She took these 
highly significant (p < .001) results as vali- 
dating the significance of the original studies. 

Until 1956, A-B status had been defined 
in terms of patient improvement rate; but, 
in the 1956 study, Betz and Whitehorn also 
reported the results of administering the 
Strong Vocational Interest Blank (SVIB) to 
35 therapists, some from the 1954 and 1956 
studies, and some from unidentified sources. 
The 15 therapists with highest improvement 
rates were defined by their rates as As, and 
the 20 with the lowest improvement rates, 
as Bs. The authors found eight occupational 
profiles on which As and Bs differed: Lawyer 
—A (p< .02), CPA—A (p< .02), Osteo- 
path—B, Printer—B (p < .02), Mathematics- 
Physical Science Teacher—B (p < 02), Car- 
penter—B, Industrial Arts Teacher—B, and 
Vocational Agriculture Teacher—B. (A and 
B indicate the direction of characterization.) 
From these scores, a preliminary "screen" 
was developed, on which 12 of the 15 As 
(previously designated As by the 70% im- 
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TABLE 1 


HE TWENTY-THREE STRONG VOCATIONAL INTEREST 
BLANK ĪTEMS WHICH DIFFERENTIATE A AND B 
THERAPISTS (FROM Betz, 1967) 


ae Item Response 
17 Building Contractor L I D 
19 Carpenter L P D 
59 Marine Engineer L I D 
60 Mechanical Engineer IL I D* 
68 Photoengraver L I D 
87 Ship Officer L ID 
90 Specialty Salesman L I p 
94 Toolmaker L I D 
121 Manual Training L Ff D 
122 Mechanical Drawing L P D 
151 Drilling in a company Ls P D 
185 Making a radio set L I D 
187 Adjusting a carburetor L I Ds 
189 | Cabinet making L I D 
216 | Entertaining others L I D 
218 | Looking at shop windows D P 
290 Interest public in a new ma- 
chine through public 
addresses L fb D 
311 President of a society ora club | Le I D 
356a | Many women friends I“ I D 
367 Accept just criticism without 
getting sore Yes ? No 
368 Have mechanical ingenuity Yes ? No 
375 Can correct others without 
giving offense Yes ? No 
381 Follow up subordinates ef- 
fectively Yes ?* No 
Note —Re inted from an article by B. Be tz entitled 
"Studi a 


ist's Role in the Treatment of the Schizo- 
phrenic eae a published in The American Journal of 
Psychiatry (1967, Vol. 123, pp. 963-971). Copyright 1967 by 
En Su) Psychiatric Association with permission of the 
T GUT Journal " Psychiatry. 
g I inc erent, D = dislike. 


provement criterion) and 3 of the 20 Bs 
matched at least six of the eight profiles in 
the “high” (A) direction. 

In 1960, Whitehorn and Betz examined 
SVIB data from the Whitehorn and Betz 
(1954) and Betz and Whitehorn (1956) 
studies on the 26 therapists who had taken 
the test. (This figure, incidentally, suggests 
an overlap of 6 therapists between the two 
studies: 32 (= 14 therapists + 18 therapists) 
— 26 = 6; but we cannot be certain of this, 
as some therapists did not take the test.) 
The data showed both As and Bs high on 
Physician, Psychologist, and Public Adminis- 
trator profiles; significant differences were 
found on Lawyer and CPA (As higher) and 
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Printer and Mathematics-Science Teacher 
(Bs higher) profiles. Scores on these four 
profiles constituted a 5-point scale (“match- 
ing" with a score of B-- or higher, 0-4 of 
these profiles). 

Also constructed was a second, 11-point 
scale, based on 10 of the 23 items (see 
Table 1) that significantly differentiated As 
and Bs (0-10 were the 11 points). Both these 
scales were used to predict the improvement- 
rate status (A = 68% or more) of a second 
group of (24) therapists. Both scales were 
accurate, the 11-point scale more so (about 
80% versus about 65%). This, then, is the 
first validation study of the A-B variable; 
the authors cited the Betz and Whitehorn 
(1956) and Whitehorn and Betz (1957) 
studies as validating the 1954 study, but the 
therapist overlap between these leaves the 
study as “validating” only in the sense that 
a new sample of patients was used. 

An additional validation study was reported. 
(Whitehorn & Betz, 1960) here. The au- 
thors used the 5- and 11-point scales to pre- 
dict the improvement-rate status (A = 6866 
or more) of 11 therapists at the nearby 
Pratt Hospital. Again, the scales were both 
successful: the 5-point scale with 67% accu- 
racy, and the 11-point scale with 89% accu- 
racy. Continuing predictive tests with Phipps 
therapists, Betz (1962) reported that the to- 
be-predicted group (7= 24) of the 1960 
study had grown to 46. For this total N, the 
5-point scale predicted A status with 80% 
accuracy and B status with 67% accuracy; 
the 11-point scale yielded 77% and 83%. She 
later (1963b) reported on another validation 
study done with 22 Pratt therapists. Here, a 
different predictive scale was used, devised as 
follows: Pratt therapists were each given à 
sum score for the eight A occupations and a 
sum score for the eight B occupations (cf. 
preceding discussion). The differences be- 
tween the sums yielded net scores, which were 
rank-ordered and divided into A (the highest 


11 scores) and B (the lowest 11) groups- . JM 


(Table 2 shows the SVIB data on the two 
groups.) Then, using outcome criteria. simi- 
lar to those in earlier studies, therapists’ iM- 
provement rates were examined. Group A 
averaged 60% improvement, and B, 49%. 
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TABLE 2 


VOCATIONAL INTEREST PATTERNS IN 
HosPrranL DIFFERED FROM E. 


WHICH A AND B THERAPISTS AT PRATT 


ACH OTHER (FROM Betz, 1963b) 


Group A (» = 11) Group B (x = 10) Significance of difference 
Vocations 
Number | Percentage | Number Percentage at » 
High scores* 
Lawyer 11 100 0 0 
Author-Journalist 10 91 1 10 10.83 .001 
Advertising Man 9 82 0 0 11.17 001 
Mathematics-Physical Science Teacher 0 0 10 100 
Printer 0 0 7 70 7.98 .01—.001 
Personnel Director 2 18 7 70 3.91 .05 
Low scores 

Carpenter 11 100 3 30 7.98 01-001 
Forest Service Man 11 100 4 40 6.55 05-.02 
Vocational Agricultural Teacher 11 100 3 30 7.98 .01—.001 
Industrial Arts Teacher 10 91 3 30 5.96 .02 
l'armer 10 91 1 10 10.83 .001 
Sales Manager 4 36 10 100 7.01 .01-.001 
Life Insurance Salesman 2 18 9 90 8.26 .01-.001 
Real Estate Salesman 1 9 7 70 5.96 102 
President of Manufacturing Concern 1 9 7 70 5.96 02 
CPA 0 0 7 70 8.74 .01-.001 


Note.—Reproduced from an article by B. Betz entitled "Bases of Therapeutic Leadership in Psychotherapy with the Schizo- 


phrenic Patie 
author and the American Journal of Psychotherapy. 


and published in the American Journal of Psychotherapy (1963, Vol. 17, pp. 196-212) with permission of the 


a “High” and "low" scores refer to Strong Vocational Interest Blank scores; high scale score = A or B+; low scale score 


= C4 or C. 


Betz attributed the significant (p < .05) dif- 
ference to As? orientation toward freedom 
and the exertion of leadership in this direc- 
tion, and to Bs' impersonal, didactic ap- 
proach. Regarding the Pratt studies, it should 
be noted that Lichtenberg had previously re- 
ported (1958) slight, nonsignificant tenden- 
cies for successful therapists at Pratt to re- 
semble As in therapeutic style and behavior, 
although he gave no data specifically on 
schizophrenic patients. 

A few comments on methodology seem ap- 
propriate here. First, there was no reason 
given for selecting the particular 10 items 
from the 23 available. Second, McNair, 
Callahan, and Lorr (1962) point to the .44 
internal consistency of the 23 items as 
“uselessly” low. This criticism is answered in 
part by the predictive ability of the 11-point 
scale revealed by these (Betz, 1962, 1963b; 
Whitehorn & Betz, 1960) studies; but these, 
too, are open to criticism. The cogency of 
the criticism, moreover, is questionable, as 
internal consistency is desirable only with a 


variable known to be unidimensional, and 
the unidimensionality of “A-Bness” is far 
from clear (cf. Carson, Harden, & Shows, 
1964). Third, on a 400-item test, chance 
could easily be responsible for the “signifi- 
cance” of the 23 items. This criticism is 
answered by the validation study results 
(i.e., these items do predict improvement-rate 
status); but these validation studies are 
themselves open to criticism. In these, as in 
other Whitehorn and Betz work, the origin 
(Betz & Whitehorn, 1956) and degree of 
overlap (Betz, 1963b; Betz & Whitehorn, 
1956; Whitehorn & Betz, 1957, 1960) of 
groups of therapists often makes it difficult 
to tell whether new data is being reported, 
old data enlarged upon, or old data re- 
reported. The two validation studies with 
Pratt therapists are a case in point. No men- 
tion is made in the Betz (1963b) study of 
the extent of overlap with the 1960 Pratt 
therapists. Also in the 1963b study, the A-B 
scale was changed drastically with no expla- 
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TABLE 3 


M CORRELATIONS BETWEEN THE VOCATIONS OF LAWYER AND MATHEMATICS-PHYSICAL SCIENCE 
7 TEACHER AND EACH OF THE VOCATIONAL SCALES; RANK-ORDER POSITION or EACH OF THE 
Vocations FOR Group A AND Group B RESIDENTS (rrom Betz, 1963b) 


correlations | Rank order | cottona’s | Rankeorder 
Occupations | Occupations 
Lawyer Group A| Group B. | Lawyer Group A| Group B 

Artist 39 | —319 | 12 | 28 || Public 
Psychologist 18 36 2 2 i Administrator 6 + 
Architect .05 a2 12 14 | YMCA Secretary 04 15 338 20 
Physician 16 E 5 7 || Social Science High 
Osteopath —.11] A6 w | 2 |. School Teacher 17 AB: 14 10 
Dentist —.18 A0 23 13 | City School 
Veterinarian 43 39 Superintendent Al 16 
Mathematician —.01 37 17% 29* || Minister 26 10 
Physicist —.20 44 | 30 34+ | Musician 10 7 
Engineer — 44 45 | 24 16 CPA an 36 
Chemist =31 56 14^ 6 | Senior CPA 12 
Production Manager| —.19 A0 30 23 | Accountant —42 36" 
Farmer = 66 .60 | 33 15 Office Man —.4l 32» 
Aviator —.54 .61 28 20 Purchasing Agent —.41 43 
Carpenter —.78 68 | 44 27" | Banker —.02 40 
Printer —.51 72. 27 4 || Mortician 40 
Mathematics- | Pharmacist | 26 

Physical Science || Sales Manager .20 44 

‘Teacher —.63 26 1 || Real Estate 
Industrial Arts Salesman A7 | —.64 7 36 

Teacher 40 23 Life Insurance 
Vocational Agri- | Salesman 36 | —.72 11 42 

cultural Teacher 41 19 Advertising Man 44 | —.74 3 25 
Policeman —.00 67 37 29* || Lawyer =163 1 33 
Forest Service Man | —.61 08 | Al 29* || Author-Journalist 76 | —.54 5 20^ 
YMCA Physical | || President of Manu- 

Director —.21 „51 28 16 || facturing Concern! 12 | —.55 | 20 34 
Personnel Director .06 2 14 7 | | 


Note.—Reproduced from an article by B. Betz entiti 
Schizophrenic Patient," and published in the American Jou 
of the author and the American Journal of Psychotherapy. 

* Exceptions where rank order does not parallel Strong's. 


nation, making the prediction procedure even 
more unclear, 

Considering As to be like lawyers, Betz 
(1963b) reexamined the data on all (93) pa- 
tients at Phipps from 1948-1954 who stayed 
for at least a month and received no insulin 
or electroconvulsive therapy; she found A 
therapists significantly more successful than 
Bs (69% versus 33%), when A status is 
defined as high score on the Lawyer profile 
and low on the Mathematics-Science Teacher 
profile, and B as vice versa. She noted here 
that As’ and Bs’ patients are very compa- 
rable in all measured respects: length of stay, 


led "Bases of Therapeutic Leadership in Psychotherapy with the 
urnal of Psychotherapy (1963, Vol. 17, pp. 196-212) with permission 


severity, diagnosis, and clinical characteris- 
tics. Their SVIB (Lawyer and Mathematics- 
Science Teacher) profiles, however, are quite 
distinct in terms of resemblance to other 
occupations, as Table 3 (Panel 1) shows. 
Note the strong resemblance between these 
patterns and the rank-order patterns of A 
and B occupations (Panel 2). 

Betz pointed to a further A-B distinction 
in a brief but important study (1963c). 
Differential success rates are even more 
striking when schizophrenic patients are di- 
vided into “process” and “nonprocess” types. 
With process patients generally seen as more 
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difficult (cf. Stephens & Astrup, 1965), As 
had 71% improvement rates, and Bs, 18%; 
with nonprocess patients, As had 68% and 
Bs, 44%. 

Stephens and Astrup (1965) examined 
4—14-year follow-up data for 236 (of the 334) 
patients for whom data was available, who 
were hospitalized at Phipps between 1950 
and 1960. They classified the 63 therapists 
by four methods: the 3-, 5-, and 11-point 
scales mentioned earlier and by a modified 
l4-point scale suggested by McNair et al. 
(1962). Using the same criteria for discharge 
status as Whitehorn and Betz, they exam- 
ined the effects of insulin treatment, patient's 
status (process versus nonprocess), and thera- 
pist rating (A-B). 

In general, they found almost no effect of 
A-B classification. Specifically, there was 
no significant difference in discharge status 
for patients with therapists classified as A, B, 
or “unpredicted,” by any of the preceding 
four scales, although differences were in the 
predicted direction. These results, moreover, 
hold for insulin-treated patients, psycho- 
therapy-only patients, and for the pooled 
groups of both. “Unpredicted” therapists, 
moreover, did not have rates falling between 
As’ and Bs’. The patients classified “process” 
showed much lower improvement rates than 
did “nonprocess” patients. Even within A and 
B groups, the strong relation between process- 
nonprocess status and improvement holds. In 
fact, the only significant correlation between 
A versus B treatment and improvement rates 
was among process patients not getting insulin 
with therapists rated on the 11-point scale 
(As had 82% and Bs 58% rates). The au- 
thors noted that even this effect disappears 
if the patients used by Whitehorn and Betz 
(1954) in devising the original scale are 
dropped. Among the nonprocess, no-insulin 
patients, there is no correlation between dis- 
charge status and A versus B treatment. (The 
nonprocess, insulin group was too small to 
study.) The authors also found no significant 
effects (a) among patients seen by only one 
therapist, (5) when only the highest As and 
lowest Bs are considered, (c) among patients 
hospitalized for less than 28 days, or (d) 
among patients transferred elsewhere, or (e) 
among patients signing out against advice. 


They criticized the Betz (1963b, 1963c) 
studies as unable to conform earlier correla- 
tions reported, since patients were predomi- 
nantly those used to devise methods for 
determining A-B status, and concluded that 
no reliable method exists for measuring the 
therapist variables that can significantly 
influence therapeutic outcome. 

Their study itself, though, is open to sev- 
eral methodological (Ford & Urban, 1967) 
and other criticisms. Their indexes of fol- 
low-up status are crude: “letters, telephone 
conversations, and personal contacts with the 
patients and their relatives [Stephens & 
Astrup, 1965, p. 450]." There is no mention 
of how consistently available each of these 
sources was, how they were weighted, checked 
for accuracy, etc. Experimenters "double- 
blinded" themselves when checking process 
status against follow-up status, but it is not 
at all clear whether they did this when check- 
ing A-B treatment against follow-up status. 
The study is statistically unsophisticated. 
Instead of using analyses of variance, so that 
interaction effects could be examined, the 
authors computed correlations for each sub- 
group or sub-subgroup. Besides ignoring 
interaction effects, this procedure, in its use 
of subgroups, restricts sample sizes and ranges 
and thus makes low, nonsignificant correla- 
tions likely. Finally, the authors’ conclusion 
that "personality" and not "technique" is 
most important for predicting outcome in no 
way “disproves” Whitehorn and Betz’ view: 
the SVIB measures “personality,” not thera- 
peutic technique. It seems doubtful, more- 
over, that technique and personality are so 
easily separable in practicing therapists. 

Betz (1967) responded to the Stephens and 
Astrup study by stating that they did not 
attend to several crucial facts: (a) The 
"unimproved" category of patients became 
too small (about 30%) between 1955 and 
1960 (versus about 50% before 1955) for 
adequate statistical study; (5) the proportion 
of B therapists decreased, making the thera- 
pist pool *an unusually homogeneous group 
of As," and the proportion of Bs too small 
for study; (c) ataractic drugs were intro- 
duced during this time, making the sample 
unlike those (“psychotherapy-only”) of the 
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1954-1957 studies (Betz & Whitehorn, 1956; 
Whitehorn & Betz, 1954, 1957). 

The Stephens and Astrup study seems 
riddled by too many flaws to “discount” or 
“disprove” the Whitehorn and Betz data. If 
they had statistically examined interaction 
effects and not “homogenized” so much 
heterogeneous data (ataractic drug versus 
no drug; year-to-year differences in propor- 
tion of improved patients, and of A therapists, 
etc.), it seems very likely that the non- 
significant (A-B) differences they found 
would have become significant. In short, one 
gets the impression that they “washed out" a 
real effect by lumping together several ex- 
traneous variables, by considering a different, 
more heterogeneous population of patients 
(1950-1960 versus 1948-1954), and by using 
improper statistics to do so. 

Keeping this in mind, it nevertheless seems 
worthwhile to study the 1955-1960 patients 
and therapists. Tt is not clear how small “too 
small” is. If 30% did not seem too small a 
proportion on which to report the effects of 
active, personal participation (Whitehorn & 
Betz, 1957), it should not be so for other 
comparisons. Absolute size rather than per- 
centage is a more important criterion; there 
are, moreover, statistical procedures available 
for both kinds of shortages. The interaction 
effects of ataractic drugs also warrant study. 

It should also be noted here that the criti- 
cism of crude data sources is also applicable 
to Whitehorn and Betz’ work. As Strupp 
(1962) and Strupp and Bergin (1969) 
pointed out, criteria are gross, and data are 
secondary. Bednar and Mobley (in press) 
showed that data obtained via more direct 
methods of measurement yield findings that 
call into serious question the Whitehorn-Betz 
conclusions, There is no report by Betz or 
Whitehorn of use of the factor-analytic 
study (Gorham & Betz, 1962) results, which 
specified accurate indexes of patient change 
(asocial acting out, thought disturbance, de- 
pression, and lack of initiative). Strupp and 
Bergin also scored the early studies’ lack of 
information on the implementation of the 
therapeutic relationship (this does not seem 
to be accurate, as the Whitehorn and Betz, 
1954, study does provide this sort of data, 
at least secondarily, from nurses’ and thera- 


pists’ viewpoints) and lack of information on 
the patient’s perception of the therapist and 
of the therapeutic experience. The White- 
horn-Betz work also suffers from the use of 
several different A-B scales? across studies. 
As Stephens and Astrup found, the use of 
different scales to rate the same therapists 
yields substantially different results. Clearly, 
a standardized A-B scale is needed (see 
subsequent discussion of Campbell, Stevens, 
Uhlenhith, & Johansson, 1968). 

Betz has drawn on the A-B data in theo- 
retical formulations of schizophrenia. At- 
tempting to point out (a) underlying 
conditions giving rise to and sustaining 
schizophrenic reactions, and (5) treatment 
conditions that eliminate schizophrenic modes 
of reaction and develop “more satisfactory 
personal and social capacities" (1962), she 
viewed schizophrenics as distrusting and un- 
able to live interdependently by give-and- 
take interaction. They must, therefore, avoid 
interpersonal involvement. She stated that 
schizophrenia is an “authority problem" 
(1966), in which the schizophrenic feels in- 
capable of self-direction; others (the preda- 
tors) hold coercive, exploitive power over him 
(the prey). He is thus constantly anxious, 
wary, distrustful of himself and others, and 
able to go off guard only in solitude. Con- 
stantly aware of his alienation, he feels deep 
loneliness and despair. His aloofness, then, 
is an active attempt at maintenance of psy- 
chological distance; he plays roles and stays 
uninvolved and unsatisfied. Interdependence 
is feared as submission. His inner life of fear 
is pierced by moments of defensive, retalia- 
tory anger, chagrin, or humiliation. Therapy, 
then, “works” when he can develop a trust- 
ing, confident relationship with a therapist, 
and when he locates and relies on his own 
inner resources. Progress can be described as 
the perceived movement of authority from 
external to internal location. The therapists 
mode of exemplifying the authority role and 
of coping with authority issues is crucial in 
influencing progress, which can be divided 
into three phases: (a) the patient is wary, 
fearful; the task is to achieve a mutually 
confidential relationship; authority is still 


® Criticism drawn from the original Chartier paper. 
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seen in the therapist; (b) exploring where 
authority is to lie; and (c) shift to choice 
and actions in real life is based on selí- 
determined planning with consultative help; 
broadened social involvement and personal 
satisfaction, 

Theoretical formulations or “explanations” 
of the entity “schizophrenia” are all subject 
to a basic criticism: the assumption of uni- 
dimensionality, that is, the assumption that 
schizophrenia is an entity. In practice, the 
diagnosis is very frequently a catch-all for 
patients with often widely varying symptoma- 
tologies; in theory, consequently, there seem 
to be almost as many “theories” as theorists. 
It therefore seems prudent here and through- 
out this paper to remember that schizophrenia 
has never, to any general satisfaction, been 
clearly and specifically delineated, let alone 
delineated as unidimensional. The patient 
heterogeneity, therefore, in the many studies 
classifying patients as schizophrenic should 
not be ignored. Clearly, this problem hampers 
nearly all research on psychotherapy, which 
is in need of much more carefully specified 
diagnostic plans than those currently in use. 
Methods like Peterson’s (1968) “behavior 
assessment” seem promising in this regard. 

This difficulty aside, though, Betz’ notion 
of therapy as a transition of perceived au- 
thority may have value as an index of thera- 
peutic competence, prognosis, and of thera- 
peutic progress. Such uses, of course, would 
have to await clear evidence of the “reality” 
of the three phases Betz described. 

The one study examining A-B therapist 
differences with actual nonschizophrenics was 
done by McNair et al. (1962). Subjects were 
55 therapists from seven outpatient Veterans 
Administration clinics and their 40 male out- 
patients matched for scheduled treatment fre- 
quency, absence of central nervous system 
damage, history of no therapy within 90 days 
before the study, education, and socioeco- 
nomic status ($3500-$4000 annual income) ; 
therapists were grouped and matched for 
experience and “competence”? as rated by 
three clinical psychologists. The A and B 
therapists were dichotomized at the median 
of scores on the 23-item scale. They used 
several ratings of patient change: Taylor's 
Manifest Anxiety (MA) scale, Barron's Ego 


Strength scale, a symptom checklist, a self- 
satisfaction rating scale, therapist ratings of 
severity of illness, Interview Relationship 
changes (IR), and an Interpersonal Changes 
and Symptom Reduction scale (IC + SR). 
After 4 months and after 12 months of 
therapy, Bs reported greater changes among 
their patients in the “improved” direction 
than did As, significantly so on the MA scale, 
Ego Strength, and severity scores, All (pooled 
A and B) patients showed significant im- 
provement on the symptom checklist, severity, 
and IC + SR. The As and Bs did not differ 
significantly on liking for the patient, interest 
in patient's type of problem, or patient's 
motivation for therapy. The As (like the un- 
successful Bs in earlier studies) significantly 
more often stated goals in concrete-descriptive 
terms, and emphasized “insight” or “under- 
standing reasons for behavior" as goals. 

These findings, then, “complemented” the 
Whitehorn and Betz studies on schizophrenic 
persons, and led to further investigations of 
Therapist X Patient (A versus B X Neurotic 
versus Schizophrenic) interaction effects. It 
should be kept in in mind, though, that the 
"interaction hypothesis” (that As are most 
effective with schizophrenics and Bs are most 
effective with neurotics) contradicts (see Foot- 
note 3) the original Whitehorn and Betz find- 
ings that As and Bs do not differ in effective- 
ness with neurotics. 

McNair et al. (1962) also reported the de- 
velopment of a 13-item scale with much more 
homogeneity (internal consistency = .76) 
than either the 10- or 23-item scales. Of the 
13 items, 11 reflect interest in “skilled labor 
or technician" activities.t Tt is not at all 
clear, however, why internal consistency for 
the A-B scale should be desirable. 

Examination of patients in this study shows 
that nearly all are of lower or lower-middle 
socioeconomic status, as compared to only 
30% of the Whitehorn and Betz patients. It 
is possible, therefore, as the authors pointed 
out, that both (McNair et al., 1962; White- 
horn & Betz, 1954, 1957, 1960) sets of re- 


*The following are Strong Vocational Interest 
Blank numbers of items constituting the 13-item 
scale developed by McNair, Callahan, and Lorr 
(1962): 17, 19, 59, 60, 87, 94, 121, 122, 151, 185, 189, 
218, 368. 
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sults are due to socioeconomic status pair- 
ings; that is, B therapists may have more 
interests in common with these patients, may 
have more similar backgrounds, or may be 
more familiar with their daily living problems 
(likewise A therapists with middle-class 
patients at Phipps). 

Reexamination of their data (Lorr & Mc- 
Nair, 1966) suggested another interpretation: 
Differences between As’ and Bs’ behavior pat- 
terns (As show more feminine and Bs more 
masculine patterns) may have interacted with 
sex differences in patients to produce the 
differences between the McNair et al. (1962) 
and the Whitehorn and Betz studies. Carson, 
however, pointed out in his (1967) review 
that patient sex and socioeconomic status are 
held constant in all the studies (Carson & 
Harden; ë Carson, Harden, & Shows, 1964; 
Kemp, 1963; Sandler, 1965) on liking and 
discomfort, and can therefore be ruled out as 
explanatory. As analogue findings, though, the 
results Carson referred to are quite limited 
in their generality. 

In another study by McNair, Lorr, and 
Callahan (1963), outpatients were again 
used, this time to construct a predictive bat- 
tery to see what factors influenced patients’ 
quitting therapy. Therapists’ A-B scores (on 
the 23-item scale) had no relation with ther- 
apy duration. But, for some reason, the au- 
thors did not differentiate among diagnoses. 
No information is given on what these were. 
Because of this, no interaction effects could 
be examined. 

Using male graduate students in clinical 
psychology as therapists and using patients of 
these therapists whom the therapists consid- 
ered “intrapunitive,” Segal (1970) analyzed 
the content of scripts of their therapy ses- 
sions. He correlated A-B scores with each 
of 16 therapist attitudinal and behavioral 
ratings, looking for differences in (a) attitude 
toward patients, as seen in verbalization; 
(b) type of therapeutic activity; and (c) 
specificity of statements. Four correlations 
reached the .05 level of significance: (a) Bs 


5Carson, R. C, and Harden, J. A. The A-B 
therapist “type” distinction and behavior in an inter- 
view situation. Paper presented at the meeting of the 
American Psychological Association, Los Angeles, 
September 1964. 
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made fewer negative comments; (b) Bs were 
more facilitative and encouraging of selí- 
exploration; (c) Bs placed less emphasis on 
having patients respond to specific questions 
or ideas; and (d) As were more directly in- 
terpretive. The study thus provides tentative 
support for the McNair et al. (1962) results, 
although the lack of outcome measures 
hinders comparison (see Footnote 3). 

Using discharge as the improvement cri- 
terion, where discharge was determined by 


participation, and presence of environmenta 


A 


symptomatic improvement, increased ul 


resources, Draper (1967) examined data for 
389 patients, whose average hospitalization 
was 5 days. Patients were randomly assigned, 
but low “Ds” (therapists with slow discharge 
rates) had more patients under 30 years of 
age, more middle-class patients, and more 
patients with no previous psychosis, High Ds 
showed 55% improvement rates; low Ds, 
30%. High Ds did better with older, middle- 
class patients and those with history of 
psychosis; they had most difficulty with pa- 
tients under 30. Low Ds used drugs more 
often. The 28 therapists were uniform on 
socioeconomic status and school records, but 
the high D group had more fathers who were 
doctors. Blind retrospective ratings of thera- 
peutic attributes found high D therapists to 
(a) use more “active participation,” as defined 
by Whitehorn and Betz (1954); (b) under- 
stand patients’ dynamics; and (c) be quiet 
or shy much more than low Ds. High D 


rating was positively related to enthusiastic 


involvement (thoroughness, willingness to do 
duties and extras, adequacy of differential 
diagnostic thinking, rationality of treatment). 
High D rating was negatively related to tact 
with house staff and attendings, and with 
keeping summaries and progress notes. 

Measuring the relationship of D rating to 
SVIB profiles, Draper (1967) found the fol- 
lowing correlations (phi-coefficients) : 


.42 Mathematics-Science Teacher 
.38 Physical Therapist 

34 Vocational Agriculture Teacher 
.50 Physician 

.28 Dentist 

.28 Chemist 

.20 Printer 
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Mathematics-Science Teacher and Printer 
were the occupations significantly differenti- 
ating As and Bs in the Whitehorn and Betz 
studies. 

There is, then, a reversal of the Whitehorn 
and Betz findings: Bs have higher D ratings. 
It seems that high D therapists, while show- 
ing the interests of professional and business 
men, tend toward common-workman interests 
more than do low D therapists. Draper con- 
trasted details of his study to those of 
Whitehorn and Betz in an attempt to recon- 
cile the disparate findings. In his study, 
therapists were less advanced, less specialized, 
and had only brief tours of service (28 days). 
They could offer only brief help for patients, 
There was therefore no intensive program 
and no chance for creative exploration. Such 
a situation could create an opportunity for 
the Mathematics-Science "Teacher type of 
therapist (educative, rehabilitative, well- 
formulated, healing, restorative) goals to 
“work” better. Silverman (1967) supported 
this view, stating that the findings are actu- 
ally consistent with Whitehorn and Betz’, 
since A therapists had no chance to initiate 
more active, personal involvement. Draper 
(1967) offered another possible explanation 
in the possibility that doctors’ sons may 
acquire a special “competence-in-crisis.” 

Of these explanations, Silverman’s seems 
to come closest to identifying the factors 
responsible for the disparate findings: namely, 
the use of widely disparate methodologies. 
The use of (a) discharge rate as a measure 
of effectiveness, and of (5) symptomatic im- 
provement and increased socialization as cri- 
teria for discharge are exactly the goals 
Whitehorn and Betz found among their B 
therapists. It should be no surprise, then, 
that if attainment of “B-like” goals defines 
success in a B-like atmosphere (quick dis- 
charge and therefore limited interpersonal 
contact), B-like therapists are more suc- 
cessful. 

Hyman (1968) examined 16 child thera- 
pists’ verbal therapy behavior for relations 
to the therapists! A-B scores. Twenty-three 
categories of therapy behavior were rated 
with four scales: interaction process, therapist 
directiveness, therapist specificity, and a pa- 
tient factor scheme. He found no relation 


between the A-B scores and any of the four 
scales, 

Bednar and Mobley (in press) argued 
cogently against the validity of the inter- 
action hypothesis, citing four sources of errors 
in A-B research: 

1. Whitehorn and Betz’ findings have been 
subject to interpretations unwarranted by 
their data. These original data were discrimi- 
nate function analyses (identifying potent 
variables for later validation) and not pre- 
dictive validation analyses, as later studies 
have considered them. 

2. Uniform procedures for identifying As 
and Bs have been lacking. With nominal 
(A-B) categorizations based on dichotomized 
or trichotomized therapist samples, differences 
in response to one or two items could change 
the A-B status of therapists close to the mean. 
Also, therapist sample size is an important 
determinant of the level of *A-Bness" exam- 
ined, that is, small sample studies necessarily 
include only less extreme As and Bs. 

3. Uniform criteria for evaluating patient 
improvement have been lacking. Since pa- 
tients improve at different rates and in differ- 
ent areas, several outcome measures are 
needed to insure that differential A-B success 
is not a result of different investigators mea- 
suring different areas of improvement. 

4. Uniform diagnostic procedures 
been lacking. 

But the authors’ argument for Source 1 is 
not completely accurate. They stated: 


have 


There is no clear scientific evidence suggesting that 
A therapists will be more successful than B thera- 
pists in the treatment of schizophrenic clients or 
offer a different type of therapeutic relationship 
[Bednar & Mobley, in press]. 

Whitehorn and Betz (1960) however, did 
report two successful validations, and Betz 
(1963b) an additional successful validation, 
in addition to the 1957 data which Whitehorn 
and Betz cited as validating. But these early 
data do not meet the need for a completely 
independent cross-validation (with different 
therapists, patients, and experimenters of rela- 
tionship and outcome findings. 

Bednar and Mobley (in press) have at- 
tempted such a study. They selected 13 As 
and 11 Bs from the 141 experienced prac- 
ticing therapists who agreed to participate in 


< 
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the study. These 24 were selected on the basis 
‘of (a) A-B status, (b) having a client with 
a gno Computer) diagnosis (measured 
by the Spitzer Psychiatric Status Schedule) 
of schizophrenic or neurotic, and (c) having 
complete process and outcome data. Three 
independent, reliable (average 7 = .91) raters 
examined tape-recorded therapy segments for 
accurate empathy, nonpossessive warmth, 
genuineness, and depth of self-exploration 
(DX). Also, therapists and patients inde- 
pendently completed the Relationship Ques- 
tionnaire, measuring accurate empathy, genu- 
ineness, nonpossessive warmth, intensity and 
intimacy, concreteness, and overall thera- 
peutic relationship. Of these nine measures, 
only one showed a significant A-B difference 
in therapeutic relationship: Consistent with 
the interaction hypothesis, raters found As 
with schizophrenics and Bs with neurotics to 
obtain higher DX ratings. The authors con- 
cluded from these substantially negative re- 
sults (@) that the differential DX ratings 
may indicate differential client responsiveness 
to As and Bs, rather than actual differences 
between As and Bs; and (5) that any A-B 
differences in outcome occur in the absence 
of corresponding differences in relationship. 
Both these points seem to beg the question. 
Rather than inferring the absence of any 
difference between A and B relationships, it 
seems more logical to infer that the relation- 
ship measures used simply did not detect 
differences that patients did detect: There 
must be something different in what As and 
Bs are or do for similar patients to react 
differentially to them. 

Examining 10 pre-posttherapy outcome cri- 
teria (MMPT; therapist, patient, and psycho- 
metrician ratings of current adjustment; 
Spitzer Psychiatric Status Schedule ratings 
of current distress, behavioral disturbance, 
impulse control, reality testing, and total 
adjustment; and a Q sort), the authors found 
positive patient changes on all 10 ratings. The 
As were more effective than Bs in relieving 
subjective distress (especially with schizo- 
phrenics) and in improving total adjustment; 
Bs were more successful in facilitating im- 
pulse control (especially with schizophrenics). 
Only the Q sort, however, yielded data con- 
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firming the interaction hypothesis. The au- 
thors concluded that the validity of the 
interaction hypothesis is highly questionable 
and added: 


A-B therapists differ in the areas in which they ... 
effect change. [As] were better than [Bs] at ef- 
fecting change [in] subjective components and [Bs] 

.. with problems of impulse control [Bednar & 
Mobley, in press]. 


Despite its comprehensiveness, the study is 
open to several criticisms. As the authors 
pointed out, both therapist and patient 
samples were probably biased. The 141 thera- 
pists were only one-sixth of those invited to 
participate; moreover, extreme ("true") Bs 
were underrepresented, as were low socioeco- 
nomic status patients. Patients were most 
likely not randomly chosen by therapists; this 
fact and the significant patient improvement 
across all patient and therapist categories 
clearly point to a restricted sample (i.e., over- 
representing "successful? therapy pairings). 
In addition, outpatient sc| hrenics, which 
the authors apparently used, are likely to be 
less disturbed than inpatients; Whitehorn and 
Betz, using inpatients, found greatest A-B out- 
come differences among the most severely dis- 
turbed patients. The “impulse control” (B) 
emphasis and the “subjective, inner” (A) em- 
phasis are consistent with Whitehorn and 
Betz’ and Silverman’s (1967) descriptions of 
Bs and As. The Whitehorn and Betz data, 
moreover, found the latter approach more ef- 
fective with inpatient schizophrenics. Thus, 
the Bednar and Mobley study seems biased 
against finding relationships or outcome data 
supporting the Whitehorn and Betz hypothesis 
or the interaction hypothesis. 

Despite these problems, the study is im- 
portant in pointing out A-B differences as a 
function of type of outcome criterion. The 
contributions of the outcome criterion vari- 
able, however, must be considered in addition 
to, and not instead of (as Bednar and 
Mobley suggested), the patient diagnosis vari- 
able. The study clearly demonstrates the 
value of triple-order (Therapist x Patient X 
Outcome Criterion) interaction data in psy- 
chotherapy research, Future research will un- 
doubtedly show the utility of still higher 
order interactions, f 
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PERSONAL CHARACTERISTICS 


Reviewing data on differences in thera- 
peutic style and outcome and in SVIB re- 
sults, Whitehorn and Betz (1960) derived 
descriptions of As and Bs. The As are 
problem-solvers; like lawyers, they seek and 
manage to find acceptable (“legal”) modes 
of behavior for patients and leeway for 
solving individual problems “loopholes”) ; 
they are not regulative or coercive, and are 
thus more acceptable, to the schizophrenic, 
whom the authors see as resentful toward au- 
thority and psychologically boxed in. The Bs 
tend to see things in black and white, right or 
wrong terms; they see patients as wayward, 
in need of correction: this alienates patients. 
The schizophrenic patient sees in an A the 
values of responsible self-determination more 
rewarded than those of obedience. The As are 
perceptive of individualistic inner experiences. 
In Bs, the patient sees an emphasis on value 
systems weighted toward conformity and def- 
erence, rigidity, mechanicalness, rule-of-thumb 
approaches, and precision that does not lead 
to the development of self-trust. The As tend 
to expect and respect spontaneity and there- 
fore tend to evoke social participation. 

Whitehorn (1962) repeated the importance 
of A-type leadership to schizophrenics; with 
neurotics, he added (somewhat less conclu- 
sively, since therapist differences here are 
much less distinct than with schizophrenics) 
more direct leadership is welcomed by the 
patient, provided this leadership is concerned 
with how to be a patient and with the 
teaching of “pschotherapeutic procedure,” not 
with how to run his life. 

The importance Whitehorn and Betz 
place on active, personal involvement is re- 
ported elsewhere (Sundland & Barker, 1962: 
Wallach & Strupp, 1964), where questions 
measuring the extent to which the therapist 
allows himself to become personally involved 
in treatment significantly differentiated 
therapists. Bergin's (1967) review of thera- 
peutic issues noted that As' success using 
active participation confirms the importance 
of warmth-interest-acceptance and empathy 
by the expressiveness (cf. Knupfer, Jackson, 
& Krieger, 1959), freshness, and spontaneity 
reported in an analogue study by Rice (1965) 


in “type III" therapists. Bergin suggested 
that proper pairing of As and Bs with ap- 
propriate patients enables them to become 
Type III therapists. Betz (1967) reached 
similar conclusions. She pointed out that As 
and Bs may have differential sensitivity to 
“avoidance” (schizoid) behavior and to “turn- 
ing-against-self" (neurotic) behavior, most 
likely yielding optimal degrees of “fit” be- 
tween such therapist and patient character- 
istics. 

In a college-student analogue study exam- 
ining patients’ perceptions of therapists, 
Carson and Harden (see Footnote 5) found 
that A interviewers with distrustful, un- 
friendly (“schizoid”) interviewees, and Bs 
with trusting, friendly (“neurotic”) inter- 
viewees, were more "effective" (gathered 
more personal information) than in opposite 
pairings. Interviewees in effective pairs saw 
interviewers as more dominant. Moreover, B 
interviewers were seen by both types of inter- 
viewees as more sympathetic, warm, gentle, 
and sensitive, and less suspicious. 

Tn the only study to date that examined 
the A-B variable in patients, Berzins, Fried- 
man, and Seidman (1969) further examined 
patient-therapist pairings. Taking the sug- 
gestion that patient-therapist complementar- 
ity, rather than similarity, is therapeutically 
more effective (cf. Carson et al, 1964; 
Sandler, 1965), they hypothesized that if 
systematic differences in mode of stress ad- 
justment are related to the A-B variable, 
they should be seen in a patient population. 
They picked as patients 68 consecutive males 
applying to the Indiana University Student 
Health Service. Five psychiatrists, one psy- 
chologist, and two social workers served as 
therapists rating the patients! symptomatol- 
ogy (unaware of the purpose of these ratings). 
Patients took a 21-item A-B scale (15 SVIB 
and 6 MMPI items) constructed on the basis 
of a recent validation, which placed the pa- 
tients in an A, AB, or B category. The re- 
sults found As presenting therapists with 
“turning-against-self” (TAS) symptoms (de- 
pression, suicidal thought, and internalized 
anger). The Bs presented only one (external- 
ized anger) of the three rated “avoidance-of- 
others" (AVOS) symptoms (withdrawal and 
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hallucinations or delusions were the other two 
symptoms). Curiously, ABs were higher than 
As or Bs on anger externalization. The pa- 
tients’ self-report of symptoms yielded very 
similar results. The patients’ expectancy 
ratings revealed As to anticipate therapists 
who were good listeners and who initiated 
and maintained conversations; As also ex- 
pected to talk about feelings. The Bs expected 
therapists to “level with” them, make evalu- 
ative comments, and be concerned with 
the etiology of their problems; they expected 
a structuring, straightforward, teacherlike 
therapist. 

Berzins et al. (1969) found in these results 
strong confirmation of an A-TAS association 
among patients and a much weaker B-AVOS 
association. They attributed the latter to the 
use of a population which was too homogene- 
ous for the heterogeneous set of (AVOS) 
symptoms. They found Bs clustering around 
the less severe psychotic symptoms (e.g., 
“awkward with others") rather than halluci- 
nations. This sample restriction also makes it 
difficult to interpret the several nonmonotonic 
functions (ABs scoring higher than As or Bs) 
obtained. The ABs are least influenced or 
irritated by others, least unsure of others’ 
reactions, least prone to escapism, and most 
prone to insomnia.) The authors cited ABs’ 
poorer prognoses and low trustingness ratings 
as precluding ABs’ being “better adjusted,” 
but, beyond this, little can be inferred from 
these results other than that if replicated 
(with broader sampling and with therapists 
as subjects), they further complicate the 
whole A-B picture. 

The authors hypothesized that the A-B 
variable may be one of approach-avoidance, 
the latter being optimal for neurotics and the 
former for schizophrenics; they also sug- 
gested that a therapist’s TAS behavior with 
neurotic patients may be due to his “blind 
spots” and may interfere with therapeutic 
communication. Aside from the limits that 
narrow sampling imposes on this study, the 
authors provided some evidence of a rough 
parallel between A and TAS qualities, and 
much more tentative evidence for a parallel 
between the B and AVOS qualities, with As 
appearing “neurotic,” trusting, intropuni- 
tive, turning-to-others under stress; and Bs, 
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“schizoid,” distrustful, extrapunitive, and self- 
isolating under stress. Berzins et al. (in press) 
later supported these data, finding As, on a 
“stress reaction checklist,” reporting feeling 
trapped, losing their tempers, feeling sorry for 
themselves, having trouble keeping conversa- 
tions going, feeling indifferent, “blah,” blue, 
depressed, and dejected when under stress; 
Bs reported trying to think clearly about 
what is happening, trying to do something 
“physical,” and trying to do something with 
their hands. The authors concluded that As 
see themselves as intropunitive or depressed 
in stress, while Bs appear “masculine” and 
attempt “solitary mastery” of difficulties. It 
should be noted, however, that (a) these 
(Berzins et al., in press) findings are again 
derived from analogue (male college student 
subject) “therapists,” and (b) the correla- 
tions from which these modes of stress re- 
action are inferred account for very little 
variance, with rs ranging from .12 to .36. 
Another area of research has related the 
A-B variable to perceptual and cognitive 
Styles, specifically, field dependence. Pollack 
and Kiev (1963) hypothesized that with 
the rod-and-frame apparatus—a device with 
which the subjects’ bodily (postural) cues 
and visual (the relative position of the rod 
and frame) cues interact in his estimation of 
the verticality of a luminescent rod in a 
darkened room—As would rely on both bodily 
and field cues, since they were neither mark- 
edly influenced by nor detached from pa- 
tients, they should therefore be neither 
strongly field dependent nor independent. Bs, 
who form less personal relationships with pa- 
tients, should be either extremely field 
dependent (if their style was passive- 
permissive) or extremely field independent (if 
they were directive as therapists. "Therapists 
were the same as those used in earlier White- 
horn and Betz studies, and all were found 
to be more field independent than college 
subjects. Bs were more independent, and also 
more consistent, as a group over time, and as 
individuals, than were As. Bs relied more on 
postural signals. There was thus no confirma- 
tion on passive therapists. The authors sug- 
gested that passivity and directiveness may 
be opposite expressions of the same motiva- 
tion, and concluded that there may well be 
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constitutional or psychological limits beyond 
which training cannot increase sensitivity to 
situational (e.g., spatial or transference) and 
internal (e.g., countertransference) informa- 
tion. The use of therapists from the White- 
horn and Betz studies may be criticized from 
the viewpoint of validity (e.g. any and all 
phenomena related to the A-B variable may 
be specific to this sample), but later studies 


answer this criticism (see subsequent 
discussion). 
Shows and Carson (1966) successfully 


replicated this study with college subjects. 
Bs were again found more homogeneous; As 
were more like controls (“intermediates” on 
the A-B scale), but showed even more vari- 
ability than these controls, as did Pollack and 
Kiev’s (1963) As. They found (based on the 
original Witkin studies they cited) field 
dependence characterized by active, analyti- 
cal, articulated, specific, critical cognitive 
functioning, as opposed to passive, global, 
vague, diffuse, uncritical functioning of field 
dependents. The results also led them to 
posit a bipolar A-B dimension, rather than a 
dichotomous typology. 

Surprisingly, performance on the Witkin 
Embedded Figures test, a two-dimensional 
measure of field dependence, had no relation 
to the A-B variable. This situation specificity 
(see Footnote 3) of different measures all 
purportedly of field dependence (i.e., the low 
reliabilities among these), leaves one hesitant 
to infer any personality differences based on 
these measurements. 

Stoler (1966) characterized extreme field 
independents as cold, distant from others, un- 
aware of social stimulus value, concerned with 
philosophical problems, individualistic, and 
strong. Carson's (1967) review found Bs 
more differentiated and, therefore, perhaps 


more mature. He stated that the preceding . 


findings agreed with well-known sex differ- 
ences on differentiating ability and with the 
sex-difference hypothesis of Lorr and McNair 
(1966). 

The most comprehensive examination of 
this area is Silverman's (1967). His review 
of research on field dependence and field 
independence resulted in the following de- 
scription of the field-independent male: He 
has relatively constant “internal”? guidelines 


for reacting to others and for selí-definition; 
he is less affectionate, less interested in other 
people, more involved in cognitive pursuits; 
he approaches problems in a more intellectual 
and impersonalized way and is less attentive 
to subtle social cues (e.g. he has poorer 
incidental memory for faces and words with 
social connotations). He has greater capacity 
for dealing with zonsocial situations in recall- 
ing task-related material. In sensory depriva- 
tion experiments, *moderate" field indepen- 
dents (which As characteristically are) are 
found to yield to the kinds of perceptual 
organization changes that prolonged sensory 
insolation induces. Extreme field independents 
(which Bs usually are) are more resistive to 
the perceptual and cognitive regression and 
depersonalization brought about by reduced 
sensory input; they describe affective and 
bodily sensations in objective terms, and are 
less receptive to task-irrelevant cues and 
therefore to low-intensity input in subliminal 
stimulation studies; they show lower respon- 
siveness to more ambiguous, more personal 
intuitive cues. 

From this information, and from A-B re- 

search, Silverman has developed “composite” 
descriptions of A and B therapists: 
A and B ts [therapists] perceive various aspects of 
their physical and social worlds differently. They 
also perceive their ps [patients] differently. A... 
is more responsive to more stimulus attributes of 
the perceptual field, including incidental social be- 
havior cues . . . to the effects of seemingly irrele- 
vant stimulation, and to changes in the organization 
of the perceptual field. He is... more capable 
of relaxing his orientation to reality and responding 
to hunches and intuition . . . thus more accepting 
of the “realness” of the schizophrenic’s perceived 
unreality, his "spread of meaning," his depersonaliza- 
lion experiences and his awe and terror . . . His per- 
ceptual responses are more similar [than B's] to the 
schizophrenic p's. [Bs counteract] . . . stimulus ef- 
fects which interfere with articulated, reality-attuned 
cognitive activity. Problem solving attempts are more 
empirically than intuitively oriented . . . [They 
share with neurotics] similar perceptions of reality 
and unreality; [they share, too, practical, goal- 
directed points of view.] [Silverman, 1967, p. 12] 

As’ similarities to schizophrenics, then, are 
(a) sensitivity to low-intensity sensory stimu- 
lation; (5) broad range of attentiveness to 
irrelevant stimuli; (c) readiness to perceive 
unique relationships between various images, 
ideas, and percepts; and (d) less frequently 
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articulated perceptual responses. Silverman 
added that the pattern resembles that of 
highly sensitive *nonpsychiatric? people in 
less personally threatening early stages of 
creative activity, and that both constructive 


and destructive forces should therefore be 
recognized as operating in schizophrenic 
behavior. 


Silverman’s recognition of the bases of 
communication between As and schizophrenics 
is important both in itself and in suggesting 
verbal and nonverbal therapist behavior, par- 
ticularly the communication of perception 
sharing, crucial to therapeutic effectiveness. 

In another recent study, Campbell et al. 
(1968) revised the 23-item scale in line with 
SVIB revisions that occurred in 1966. They 
developed a standard (SVIB) A-B scale with 
which they attempted to provide normative 
data on this scale for other occupational 
groups. They found that the highest (A-most) 
profiles were Author-Journalist, Lawyer, 
Artist, Librarian, Advertising Man, Minister, 
School Superintedent, Life-Insurance Sales- 
man, and the lowest (B-most) profiles were 
Carpenter, Pilot, Veterinarian, Mathematics- 
Science Teacher, Farmer, Business Education 
Teacher. They described A occupations as 
verbally oriented and more intellectual; B 
occupations belong to rough-and-ready, out- 
door doers, who were practical, straight- 
forward, and  nonintellectual nonthinkers. 
From data on medical specialists, they found 
surgeons and radiologists scoring low on A-B; 
internists, intermediate; and psychiatrists, 
high. (Note again the practical-mechanical 
versus verbal-interpersonal-intellectual orien- 
tation.) Finally, their test-retest data on 
medical school subjects showed little change 
from freshman to senior year on A-B Scores 
'The lack of independent validity data and the 
small item overlap between the new ( Camp- 
bell) scale and the earlier Scales, however. 
may create new standardization problems in 
place of those it “solves.” 

Dublin, Elton, and Berzins (1969) discrimi- 
nant analysis of A and B undergraduates? 
responses on a personality and aptitude test 
battery tentatively supported the Campbell 
et al. findings. Dublin et al. found As to show 
higher femininity and verbal aptitude and low 
natural science aptitude. 
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ANALOGUE STUDIES 


As mentioned, the Chartier paper treats 
the A-B analogue studies in depth. Two 
papers, however, are reviewed here because 
of their relevance to the present author's 
conclusions. 

In Carson's (1967) review of research on 
the A-B variable, he pointed out that (a) in 
studies on liking and discomfort, none of the 
A-B subjects were actually therapists; (b) As 
and Bs were placed in real or fictitious dyadic 
encounters, that is, situations varying in simi- 
larity to actual therapy; and (c) in all of 
these, subjects reported discomfort or lack 
of interest in the type of patient they 
"should" be effective with. He concluded that 
the data on the A-B variable are unclear, 
although there is obviously some powerful 
effect involved. What the data do suggest is 
that when As’ and Bs’ engagement with the 
“other” is limited to making judgments about 
him on the basis of limited descriptive data, 
As with distrusting extrapunitives and Bs 
with trusting intrapunitives have negative re- 
actions to the person and the experience. The 
use of student subjects may be responsible 
for this, Carson stated, since they are less 
equipped to cope with others whose tech- 
niques seem to have so much personal rele- 
vance. To support this suggestion, he cited 
Stoler’s (1966) finding that residents have 
mastered this problem. Apparently the data 
for the Kemp and Carson (1967) study 
(which found that experience, i.e., psychi- 
atric training, virtually eliminated the “para- 
doxical discomfort" effect) were not yet 
available. But Carson reported that there is 
still no explanation for As' describing them- 
selves in TAS terms and Bs’ in AVOS terms; 
this result seems to him to conflict with 
Stoler’s data. To the present author, though, 
there is no conflict: Stoler’s data are based 
on subjects who never dealt with patients 
even in a pseudotherapy setting. There is a 
qualitative difference between theory or 
pseudotherapy and listening to therapy in- 
volving another therapist and patient. There 
is, then, almost no reason for similar likabil- 
ity data. In view of other studies (e.g. 
Berzins & Seidman, 1968; Berzins et al, 
1969), “dislike? or “discomfort” findings 
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seem very likely if some sort of patient- 
therapist complementarity (TAS therapist 
with AVOS patient, or vice versa) is optimal 
or even necessary for effective therapy. It 
should not be surprising, then, as Carson 
concluded, that when the engagement with 
the "other" is more extended, As with dis- 
trustful extrapunitives and Bs with trusting 
intrapunitives respond with greater perceived 
collaboration, activity, and efficiency, and ap- 
pear to develop a greater sense that the other 
is behaving flexibly and cooperatively. Ex- 
tending the engagement, like increasing thera- 
pists’ experience, makes the engagement more 
like effective psychotherapy. The “powerful 
effect? Carson described, then, seems real. 
But it is complicated by interaction with the 
degree to which the experimental situation 
(“the engagement with the other”) resembles 
one of effective therapy (i.e., an experienced 
therapist actually dealing with a real patient 
in more than limited contact). As Chartier 
concluded in the preceding article a clear 
validation of the patient-therapist comple- 
mentarity (“interaction”) hypothesis in an 
actual therapy setting is needed. 

Trattner (1968) conducted two experi- 
ments, testing and confirming (a) that As 
influence (“bias”) low social-competence 
schizophrenics more than do Bs, and that 
Bs bias high social-competence schizophrenics 
more than do As; and that (b) As com- 
municate certain vocal qualities to low social- 
competence schizophrenics more intensely than 
do Bs, and that Bs communicate these more 
intensely to high social-competence schizo- 
phrenics than do As. Trattner chose 2 As 
and 4 Bs from 28 of the 118 male attendants 
at Boston State Hospital who (a) returned 
the 31-item A-B questionnaire (Kemp, 1963), 
and (b) scored extremely high or low on it. 
The 50 schizophrenic patients, chosen on the 
basis of scheduling availability from the 157 
schizophrenics who were not organically im- 
paired or “mentally deficient,” were randomly 
assigned to attendants. Patients were rated 
on à social-comptence (SC) scale, rating age, 
marital status, educational and occupational 
level; and were divided into high- and low- 
SC groups. Attendants acted as experimenters 
testing the patients’ ratings of the “success- 
fulness” of 10 people whom the patients saw 


in photos. These 10 photos (Rosenthal 
method) were standardized and each elicited 
an average rating of “zero” (neutral) in 
previous research (+10 and —10 were the 
extremes). Just before testing each patient, 
experimenters were told that the patient was 
either of a “personality type” that averaged 
+5 (positive) or of a type that averaged —5 
(negative) ratings. The patient order was 
counterbalanced, and patients were random- 
ized according to alleged “type,” and Trattner 
was “blind” regarding the patients’ SC until 
after experimenters tested patients. As pre- 
dicted, A attendants biased low-SC patients, 
in both “+” and “—” directions, more than 
did Bs; and Bs biased high-SC patients more 
than did As (p= .05). 

In the second experiment, Trattner had 13 
Harvard College males rate tapes of experi- 
menters in the preceding tests on nine quali- 
ties: discomfort, awareness of the other, 
coldness-distance, sophistication, self-confi- 
dence, dominance, professionalness, masculin- 
ity, and warmth-friendliness. (Interrater reli- 
abilities were fairly high, ranging from .58 to 
.96, with a median of .77). There were no 
A-B differences on any of the nine, but 
there were significant Patient X Experimenter 
(SC x A-B) interactions on all nine: As with 
low-SC patients and Bs with high-SC patients 
were rated higher (than in opposite condi- 
tions) on all qualities but discomfort and 
coldness, on which they were rated signifi- 
cantly lower (overall p= .03). Trattner 
grouped the seven most significant qualities 
(all but coldness and awareness) into “Social 
Control,” which he concluded was the major 
dimension mediating the interaction. Addi- 
tionally, he suggested that (a) warmth and 
masculinity are most important in creating 
ease and stability; (5) this firmness is desired 
by patients, perhaps as a welcome relief 
from past double-bind experiences; and (c) 
patients communicate their level of SC non- 
verbally to experimenters. He also noted less 
discomfort in more “effective” pairings than 
in less effective pairings, 

This study can be criticized for using a 
sample of experimenters which was not only 
very small, but also very select: Trattner 
noted that the poor questionnaire return was 
largely due to hostile reactions among pros- 
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pective experimenters to the study. The use 

f only two As under these circumstances 
NR other *explanatory" possibilities. 
Trattner, moreover, seemed to ignore the 
weakness of analogue data (e.g., considering 
professional training a complication, and ap- 
parently attributing little importance to the 
Kemp and Carson, 1967, finding that experi- 
ence eliminates most of the paradoxical dis- 
comíort effect). Yet Trattner interpreted his 
own findings of less discomíort as being due 
to the use of professional workers. In spite 
of these weaknesses, the study is valuable 
because it is a beginning of research on the 
nature of communication in a situation of 
effective “influencing.” It is necessary to 
identify what As and Bs do in therapy if the 
A-B variable is to have meaning beyond an 
SVIB categorization. The importance that 
communication of social control, social com- 
petence, warmth, and masculinity have in a 
situation in which actual patients are influ- 
enced by those who spend so much time with 
them (e.g., attendants) clearly warrants more 
extensive study. 


CONCLUSIONS 


Where do all these data lead us? Conclu- 
sions are not immediately clear. As Carson 
(1967) pointed out, the A-B variable is 
powerful, but the scale defies description in 
terms even remotely related to psychother- 
apy; it is therefore difficult to get adequate 
conceptualizations via correlations with well- 
known personality scales. He reported an un- 
published study at Duke University that 
attempted to get at the A-B dimension via 
factor analysis. Five factors were identifed: 
(a) disinterest in mechanics (A); (b) inter- 
personal ascendancy (does not differentiate 
As and Bs); (c) disinterest in manual activi- 
ties (A); (d) personal adjustment and secu- 
rity (does not differentiate); and (e) engi- 
neering interest (B). Because the psycho- 
therapeutic relevance of these is not obvious, 
he concluded, the A-B scale is an indirect 
though highly correlated measure. 

It does seem possible, however, to impose 
some order on the available data and to point 
out fruitful directions for future research. 
Despite methodological differences and flaws, 
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it seems likely that As have some “affinity” 
with schizoids. Silverman pointed out (1967) 
some of the commonalities: sensitivity to 
low-level, peripheral, and/or “inner” stimuli 
that Bs seem to ignore or suppress. As also 
seem, in their acceptance of these perceptions, 
to be able to communicate back (Travland, 
1968) to patients in a manner showing that 
they can and do perceive certain behavior 
and perceptions in schizoids without con- 
demning them; that is, they both send and 
receive this kind of communication; they can 
empathically perceive the schizoid patient’s 
behavior and perceptions. Bs seem unwilling 
or unable to do this. As' flexibility seems 
crucial here. As differ from schizoids in, 
among other respects, being more trusting and 
intropunitive, and obviously in being much 
better able to deal with others, as the 
“complementarity” and other findings suggest. 
As seem to be both sensitive to the kinds of 
perceptions and orientations that schizoids 
have, and yet able to “absorb” these adap- 
tively into their lives, while schizoids’ sensi- 
tivity to inner or eripheral stimuli may be 
more a result of ning away from more 
"objective," or more usual social stimuli. No 
ready explanation “occurs for “paradoxical 
discomfort,” but there may not be need for 
one, since, as Kemp and Carson (1967) have 
shown, experience as a therapist virtually 
eliminates the effect; it may be a function of 
restricting the therapists’ freedom as thera- 
pists, as Berzins and Seidman (1968) sug- 
gested; there is little evidence for it outside 
of analogue studies. It may well be that 
simultaneous sensitivity to “schizoid-like” 
perceptions, that is, empathy with schizoid 
experiences and to societal rejection of these 
as “unreal” is responsible for As’ discomfort. 
(Bs’ “rejection” of this type of perception for 
a more “empirically” structured reality is 
the norm.) It also seems likely that experi- 
ence as a therapist has helped A “master” 
or accept this sensitivity without suppressing 
it, so that the combination of flexible sensi- 
tivity and experience gives A optimal chances 
for therapeutic success with schizoids. 

Both As and Bs seem to possess some 
ascendant or aggressive quality that is not 
incompatible with empathy (although this 
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empathy varies with patient type) or with 
feelings of security about themselves. The 
affinity of Bs with neurotics has not been 
well-demonstrated outside of analogues; and, 
although undeniable tendencies in this direc- 
tion are clear, no study can claim unambigu- 
ous confirmation. Thus, more conclusive evi- 
dence is needed before speculation on the 
nature of this affinity can be meaningful. 
Related to As’ ability to perceive and com- 
municate his perception and acceptance of 
Schizoidlike perceptions in a commonality 
among A occupations (Lawyer, CPA, Author- 
Journalist, Advertising Man, Sales Manager, 
Life-Insurance Salesman, Real Estate Sales- 
man, President of Manufacturing Company) 
that seems to have gone unnoticed by pre- 
vious researchers. Of the eight occupations, 
at least six involve some element of “selling 
oneself," not in a prostituting sense, but in 
the way that a good salesman is said to be 
able to sell himself. (The apparent exception, 
CPA, may be important for its *]oophole- 
finding? ability; President of Manufacturing 
Company does involve much "selling? but in 
ways that may not be so obvious.) Bs seem 
not to want to sell themselves, but to be 
alone (physically and psychologically) and 
to deal precisely with objects. An examina- 
tion of the 23-item scale (Table 1) confirms 
this observation, with two possible exceptions 
(items 90 and 216). But even with these 
exceptions, the items suggest a persuasiveness 
in As that seems to be an important part of 
the affinity discussed earlier. Although A-B 
Scores have not been related to persuasive- 
ness, the latter has been found important in 
therapy. Truax, Fin, Moravec, and Millis 
(1968) have recently found resident psychia- 
trists rated high in “persuasive potency” to 
effect greater patient improvement. Equally 
important, this potency seems to operate 
somewhat independently of level of empathy 
and genuineness in effecting improvement, 
This suggests that attributes related to per- 
suasiveness (e.g., status, credibility, reward- 
ingness), traditionally not considered as 
related to therapeutic style, may prove to be 
at least as important to outcome as the usual 
“therapy” variables. Extensions of work like 


Trattner’s (1968) should be important in 
uncovering both types of data. 

Examining the therapy situation for per- 
suasiveness is in line with current research 
attempting to relate attitude-change data to 
the therapy situation (e.g., Goldstein, Heller, 
& Sechrest, 1966; Strong, 1968), where con- 
cepts like cognitive dissonance, degree of dis- 
crepancy, etc., seem promising and relevant. 
Paradoxical discomfort, for instance, may be 
the uncomfortable experience of dissonance 
in which the therapist feels positive toward 
the patient (shares perceptions, understands 
his motivations) but not toward the patient’s 
“crazy” behavior. This unbalanced or dis- 
sonant situation creates discomfort in As, 
who reduce the dissonance by working to 
effect changes in the patient. The situation 
is not so uncomfortable for an experienced 
therapist because he has experienced success 
in effecting changes in patients; that is, the 
therapist’s dissonance is reduced by his 
knowledge that the dissonance is temporary. 

In addition to persuasiveness (therapist and 
patient) socioeconomic status, therapist and 
patient sex-interest patterns, type of neurosis 
or schizophrenic reaction (i.e., more precise 
assessment of the type of patient), and thera- 
pist orientation are all possibly related to 
A-B, and their relations should be examined. 
It seems especially surprising that no one 
has tested Betz’ (e.g., 1966) contention of 
the importance of the therapists authority 
orientation, as this variable has been studied 
extensively in other connections and seems to 
be important in therapy. 

In line with comments on Trattner's (1968) 
work, it clearly seems more valuable at this 
point sot to pursue correlational analogue 
studies of the relation of the A-B variable 
to several other dimensions—at least not 
until “A-Bness” has been more clearly speci- 
fied as a (set of) dimension(s) characterizing 
in-therapy behavior. The methological prob- 
lems of the analogues (see Footnote 3) should 
make this need all the more clear. It is from 
therapy situation variables (such as authori- 
tarianism, social contro], persuasiveness, com- 
plementary versus similar patient-therapist 
pairings, change in the patient's view of the 
locus of authority, the nature of the patient's 
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problemi identified and/or focused on by the 
therapist, etc.), in interaction with patient 
and therapist demographic and diagnostic 
variables—that is, the relation of all these to 
A-Bness and to outcome—that the most 
useful and vital data should come. 

No matter how “powerful” the A-B mea- 
sure may be, it is obviously only one of sev- 
eral measures that may describe or relate to 
therapeutic behavior and its effects. It is also 
important to recognize the A-B variable as 
a continuum, not a bipolar typology. (The 
large middle group, ABs, or “unpredicteds,” 
whose data frequently do not fall between As’ 
and Bs’, must be considered.) Furthermore, 


the lumping together of patients into “schizo- . 


phrenic” and “neurotic” categories makes it 
easy to ignore heterogeneity within these 
groups and the continuous nature of severity 
of disturbance—it, too, is multidimensional 
and not typological. With these considerations 
in mind, it seems to the present author that 
further research in the areas discussed 
may well yield answers to some of the most 
important questions about effective psycho- 
therapy. 
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The present review traces the develo 


"type" as an important variable in ps; 


pment of research on A-B therapist 
ychotherapeutic outcome. Two sets of 


clinical studies which suggest that therapist type and patient characteristics 
interact in producing differential outcome are critically reviewed. Subsequent 
research, largely of an analogue variety, is then evaluated with particular 


respect to the adequacy 
been approximated. 
failed to increase understanding 


clinical research provided 
Implications for future research 
of verifying 


With the advent of more sophisticated 
conceptions of psychotherapy has come the 
recognition that psychotherapy is not a uni- 
tary process and is not applied to homogene- 
ous patients by homogeneous therapists. 
Recent reviews of psychotherapy research 
(e.g., Kiesler, 1966; Strupp & Bergin, 1969) 
make it abundantly clear that the question of 
whether psychotherapy is effective is an inap- 
propriate one which must be reformulated in 
terms of the interaction between therapist, 
technique, patient, and outcome variables, 

One particularly promising variable with 
interactional implications, the A-B therapist 
“type,” has received increasing attention in 
recent years. From the work of Whitehorn 
and Betz with schizophrenic patients at the 
Phipps Clinic (reviewed by Betz, 1962, 
1967) came the suggestion that the personal- 
ity of the therapist is associated with varia- 
tion in treatment outcome. Subsequent re- 
search indicated the possible significance of 
patient variables and has led to experimental 
attempts to explicate therapist-patient inter- 


1The review of the literature was completed in 
April 1969. The author is indebted to H. S. 
Arkowitz, R. H. Fisher, C. E. Rosenberry, and M. 
Shaffer for help with preliminary drafts, and to 
J. I. Berzins and E. Seidman for making unpub- 
lished materials available. The author is especially 
grateful to E. Lichtenstein, whose editorial and 
substantive contributions amounted nearly to co- 
authorship. 

Requests for reprints should be sent to the au- 
thor, Department of Psychology, Arizona State Uni- 
versity, Tempe, Arizona 85281. 


of the original phenomena and 
analogue studies constitute a case of premature simplification, 


with which natural psychotherapy conditions have 
It is argued that the 


extensive laboratory research has 
that the 
since the original 


an insufficient basis for the interaction hypothesis. 
are discussed, with emphasis on the necessity 
A-B therapist type effects in a naturalistic context. 


action effects. The potential impact of such 
effects, if reliably demonstrated, on current 
notions of professional training, therapist and 
patient selection, and psychotherapy process 
and outcome, cannot be overestimated. How- 
ever, investigations of correlates of the A-B 
dimension have often produced unexplainable 
and sometimes paradoxical results (Carson, 
1967). 

The present review offers a critical evalua- 
tion of major findings on the A-B therapist 
variable * and also argues that the often con- 
fusing results may be attributed to a prema- 
ture leap into the laboratory (Bordin, 1965). 
That is, since the analogue studies reviewed 
are explicit attempts to relate phenomena to 
the original clinical research, a source of 
concern is the soundness of the strategy 
underlying these laboratory simplifications. Of 
particular interest is the similarity of the 
“therapists,” the “patients,” and their inter- 
personal relationship to their counterparts in 
real life. For heuristic clarity, the empirical 


* The present author and A. M. Razin submitted 
independently written reviews nearly simultaneously 
to this journal. Subsequent collaboration, intended 
to minimize duplication, led to the focus of the 
present paper on analogue research, whereas the 
Razin review emphasizes the clinical studies, Each 
author has borrowed observations from the original 
paper of the other, but both reviews essentially 
retain the interpretations and arguments of the 
original versions. As a result of this cooperative 
effort, neither review is individually comprehensive, 
and the present paper treats summarily some Issues 
which Razin reviews more extensively. 
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evidence is grouped according to the degree 
of departure from natural psychotherapy 
conditions, a framework prompted by Cart- 
wright's (1968) critique of one study herein 
reviewed. Significant findings are summarized 
under the four categories of primary research 
focus thus far: cognitive style, subjective 
reactions, behavioral responses, and percep- 
tion by patients. The present approach most 
Clearly contrasts with an earlier review by 
Carson (1967) in the inclusion of many 
recent studies, the sharper distinction between 
Clinical and analogue research, the more 
critical nature of conclusions drawn, and the 
implications for future investigation of the 
A-B therapist variable. Specifically, it is 
argued that the clinical research has provided 
insufficient justification for the laboratory 
studies that followed, that as a result our 
knowledge of the A-B variable has not been 
advanced, and that verification of the natural 
phenomena is now necessary. 


CLINICAL STUDIES 


The major findings of two sets of clinical 
studies, brieflly summarized as follows, pro- 
vided the original impetus for research on 
the A-B variable. The Whitehorn and Betz 
studies (summarized by Betz, 1962, 1967) 
were initiated in the 1940s by clinical obser- 
vations which suggested that the experience 
of the interpersonal relationship with the 
psychotherapist was the crucial factor in 
effecting favorable change with schizophrenic 
patients. Subsequent focus on therapists who 
were differentially successful with schizo- 
phrenics but equally effective with nonschizo- 
phrenics produced evidence, based primarily 
on retrospective analysis of clinical notes and 
treatment records, that the therapists mani- 
fested different clinical styles. Further efforts 
to discriminate between such therapists, with 
the aim of determining how they differed as 
persons, disclosed that psychotherapists who 
obtained high success rates with hospitalized 
schizophrenics (Type A) differed from those 
with low success rates with such patients 
(Type B) on four scales of the Strong Voca- 
tional Interest Blank (SVIB). The As scored 
high on the Lawyer and CPA Scales and low 
on the Printer and Mathematics-Physical 


Science Teacher Scales, whereas Bs showed 
the opposite pattern on these four scales. 
Both As and Bs scored high in three voca- 
tions: Physician, Psychologist, and Public 
Administrator. Combining the earlier clinical 
observations with the assumption that the 
statistically significant SVIB differences re- 
flected attitudes characteristic of persons en- 
gaged in the four professions, the investiga- 
tors inferred that the therapists represented 
different clinical styles and complex personal- 
ity constellations which influenced their abil- 
itv to respond effectively to schizophrenic 
patients. In brief, As were seen as possessing 
a problem-solving approach, including genu- 
ineness, respect, sympathetic independence, 
perceptive attunement to the patient’s inner 
experience, and expectation of responsible 
self-determination. Bs, in contrast, were de- 
scribed as mechanical, passively permissive or 
authoritatively instructive, concerned with 
symptom reduction rather than the use of 
assets, and solicitous of deference and con- 
formity (Betz, 1967). 

Ttem analysis subsequently identified 23 
items on the SVIB which reliably differenti- 
ated between therapist types, 10 of which 
were selected and cross-validated in the same 
clinical setting (Whitehorn & Betz, 1960) 
and on a small sample (N = 11) at another 
hospital (Lichtenberg, 1958). Scores on some 
combination of the SVIB items or scales then 
became the defining measure of A-B therapist 
status in all future studies. 

An excellent study by McNair, Callahan, 
and Lorr (1962), designed to determine the 
applicability of Whitehorn and Betz’ major 
findings to a nonschizophrenic male outpa- 
tient sample, produced complementary results. 
That is, B therapists were significantly more 
successful than A therapists on several out- 
come criteria. Analysis of a number of sub- 
jective and behavioral therapist responses 
produced only one significant difference: In- 
dependent ratings of therapists? description 
of goals at the initiation of therapy revealed 
that As more frequently stated problems and 
goals in concrete-descriptive terms. The au- 
thors interpreted this finding as providing 
only minimal support for the notion that 
differences in patient response were due to 
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therapists’ reactions to them, and instead saw 
fit to explain their data in terms of sex- 
role or social-class-related variables (Lorr & 
McNair, 1966). 


Critique 

Combination of the Whitehorn and Betz 
and the McNair et al. studies generated the 
“interaction hypothesis” which has guided 
much of the subsequent research on the A-B 
variable. In its most general form, the 
hypothesis is that A-B therapist "type" inter- 
acts with patient variables related to diag- 
nostic category such that As and Bs achieve 
greater success with schizophrenic and neu- 
rotic patients, respectively. Marked differ- 
ences between the studies suggest that this 
inference may be unwarranted. 

First, the measures employed to assess both 
process and outcome variables differed in sev- 
eral respects. Whitehorn and Betz (see Betz, 
1962) conducted a post hoc search of case 
histories for therapist response differences. 
Improvement was evaluated by means of 
therapist and senior staff judgments, dispo- 
sition at discharge, and nurses’ notes and be- 
havior ratings. In contrast, the process mea- 
sures used by McNair et al. (1962) consisted 
of therapist ratings, on standard forms, of 
their own and their patients’ responses at the 
beginning and over the course of therapy. 
Outcome measures, administered prior to, 
during, and following treatment, included 
four psychometric tests and three change 
measures based on therapists? ratings, all of 
which were drawn from previous research. 

Second, the patient samples differed in ways 
other than their diagnostic categories. Most 
of the McNair et al. (1962) patients were of 
lower-class or lower-middle class background 
and all were males, whereas a minority of 
the Whitehorn and Betz patients was of 
analogous socioeconomic status and probably 
consisted of at least 50% females (Lorr & 
McNair, 1966). The possible effects of hos- 
pitalization were also uncontrolled, 

Third, it should be noted that the White- 
horn and Betz series employed different ver- 
sions of the A-B scale in different studies (cf. 
Betz, 1958; Betz, 1963; Whitehorn & Betz. 
1960), whereas McNair et al. used still an- 
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other variant in selecting their therapists. 
The import of this difference in selection 
procedures is deferred for later discussion, 
since similar problems with more recent 
research are also evident. 

Finally, it should be reiterated that White- 
horn and Betz found no outcome differences 
between As and Bs in the treatment of non- 
schizophrenic patients, including those diag- 
nosed as neurotic (Betz, 1962). The McNair 
et al. (1962) sample consisted only of non- 
schizophrenic outpatients. 

The inconsistencies in process and outcome 
measures, sample characteristics, and thera- 
pist selection procedures between the White- 
horn and Betz series and the McNair et al. 
study subject the findings to a variety of 
interpretations, since no clear basis for com- 
parability has been provided. It is noteworthy 
that the interpretation which served to stimu- 
late subsequent laboratory investigation—the 
interaction hypothesis—has not only re- 
mained undemonstrated in the clinical studies, 
but is actually counterindicated by compa- 
rable success rates of As and Bs with 
neurotics (Betz, 1962)? : 


ANALOGUE STUDIES Usinc REAL 
PsvcuorHERAPISTS 


This group of studies is characterized by 
the use of actual psychotherapists, in ana- 
logue situations, in attempts to identify cor- 
relates of the A-B therapist variable. 

Pollack and Kiev (1963), using the same 
male staff members of the Phipps Clinic as 
Whitehorn and Betz, found that A and B 
therapists differed significantly in spatial 
orientation, as measured by the Witkin Rod 
and Frame Task (RFT), Bs demonstrating 
more field independence than As. The au- 


thors interpreted their results as consistent 
3 The use of “schizophrenic” and “neurotic” 
throughout this paper reflects the terminology of 
the studies reviewed, and in no way intends to 
imply that these are meaningfully precise labels 
for homogeneous patient groups. It is assumed that 
the reader, and particularly the potential researcher 
on the clinical studies to be suggested later, is SAG 
aware of the growing demand for more sPe™ t 
delineation of independent as well as depend" 
variables in psychotherapy research (cf. Geld’ 
1965; Silverman, 1964; Strupp & Bergin, 1969). 
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with both the Whitehorn and Betz and the 
Witkin data (e.g., Witkin, Dyk, Faterson, 
Goodenough, & Karp, 1962) on cognitive 
style and its personality correlates. That is, 
Bs were seen as heavily reliant upon internal 
stimuli and As as more receptive to both 
situational and body cues. The latter per- 
ceptual organization, they argued, is con- 
ducive to the flexible, reciprocal, more per- 
sonal nature of the therapeutic relationship 
characteristic of A therapists with schizo- 
phrenic patients, which in turn increases the 
probability of therapeutic success. Bs, in 
Contrast, were viewed as more precise, de- 
tached, and less capable of continual organi- 
zation of cognitive and conative inferences 
based on situationally derived perceptions. 

Using tape recordings of psychiatric inter- 
views of schizophrenic patients, Stoler (1966) 
found that psychiatric residents classified as 
As rated the patients in general as more 
likable than did Bs. Moreover, Bs reported 
more likability for “less process" than for 
"process" patients, whereas As ratings were 
not significantly different across this patient 
category. All therapists, regardless of A-B 
type, indicated a preference for working with 
neurotic rather than psychotic patients, In 
addition, Bs likability ratings were better 
predictors of outcome than were As. However, 
the low level of intrarater reliability (mean 
r= .57) renders these findings of dubious 
value.* 

Kemp and Carson (1967) presented psy- 
chiatric residents with brief written descrip- 
tions of two “patients” whose symptoms were 
assumed to represent typical neurotic or 
schizoid complaints. Analysis of a question- 
naire completed by the therapists indicated 
that As rated both patients as having a poorer 
capacity for self-observation than did Bs; and 
that As with schizoid and Bs with neurotic 
“patients” rated themselves as more uncom- 
fortable than did therapists in the opposite 
pairings. This latter finding is contrary to 
most conceptualizations of the therapist-type— 
patient-type interaction hypothesis, and has 
received attention in other studies that are 
reviewed later, 

* Contribution drawn by this author from the 
original Razin manuscript. 
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Cohen (1967) used both professional psy- 
chotherapists and male undergraduates in 
examining the relationship between A-B type 
and four categories of therapist expectations 
(attraction toward the patient, and the 
process, goals, and outcome of therapy). 
Subjects responded by means of a series of 
rating scales after each of 10 written state- 
ments chosen to represent either neurotic 
Or schizophrenic traits. Of the numerous 
dependent variables, only one measure of 
process expectation showed a significant and 
predicted interaction with therapist type. 
Critique 

Pollack and Kiev's (1963) provocative 
finding of differential response to the RFT 
would certainly appear to demand further 
investigation, particularly with therapists from 
a different sample than that of Whitehorn 
and Betz, as correlates of A and Bness may 
be specific to this sample (see Footnote 4). 
In the meantime, one must exercise caution 
against premature inferences about personal- 
ity differences from performances on a single 
perceptual task. A number of perceptual 
measures purport to assess field dependence- 
independence, but the correlations between 
them, while usually significant, have typically 
not been large enough to account for more 
than 20% of the relevant variance (cf. Elliot, 
1961). Similar evidence for the situation 
specificity of perceptual response dispositions 
is offered in a study, reviewed later, which 
found that A and B college subjects differed 
significantly on the RFT but not on the 
Witkin Embedded Figures Test of field de- 
pendence (Shows & Carson, 1966). 

The other three studies cited earlier are 
open to several criticisms. In all cases, several 
elements may have been operating in the 
task which were not at all relevant to a 
therapists usual mode of functioning. Cer- 
tainly there was no interpersonal interaction 
of the reciprocal, response-contingent form 
one encounters in natural psychotherapy con- 
texts. Also missing were the nonverbal cues 
which presumably affect person perception to 
considerable degrees. Furthermore, except for 
Stoler (1966), the patient stimuli were not 
productions of patients at all, but rather crea- 
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tions of the experimenters, presented in writ- 
ten form. The artificiality of these conditions 
suggests that positive and negative findings 
are equally suspect, and indeed, the Stoler 
(1966) and Kemp and Carson (1967) results 
seem almost contradictory. 

The main conclusions drawn from this set 
of studies are that therapists who differ on 
the A-B scale may also differ in perceptual 
processes, and evaluative attitudes may play 
a role in mediating their differential effective- 
ness with patients of different diagnostic cate- 
gories. The full implications of these very 
promising and interesting lines of evidence, 
however, await confirmation and clarification. 


ANALOGUE STUDIES: PSEUDOTHERAPISTS 


In terms of number, by far the most exten- 
sive investigation of the A-B therapist dimen- 
sion has typically focused upon the responses 
of undergraduate male college students 
selected on the basis of having obtained ex- 
treme scores on one form or another of the 
A-B scale. These studies have assumed that 
the A-B variable is an individual difference 
measure, the relevant correlates of which 
should be manifested in the general popula- 
tion and generalizable to professional psycho- 
therapists. Since this research has been pri- 
marily concerned with the interactive effects 
of therapist-type and patient-type, experi- 
mental manipulation of "patient" character- 
istics usually has been in terms of either 
“schizoid” versus “neurotic” modes of adjust- 
ment, or the Phillips and Rabinovitch (1958) 
symptom-cluster categories of avoidance-of- 
others (AVOS) versus turning-against-self 
(TAS). For ease of exposition, and in accord- 
ance with the general hypothesis usually 
under study, As with AVOS (schizoid) and 
Bs with TAS (neurotic) patient-type stimuli 
are referred to here as compatible" pairings, 


and the opposite combination as “incom- 
patible" pairings. 


Subjective Reactions 


Kemp (1966) read passages descriptive of 
either AVOS or TAS patient characteristics 
to male college subjects, who then listened in 
groups to a prerecorded interview of two 
actors role playing a therapist and patient. 


'The tape, which was identical for all subjects, 
was stopped at 28 predetermined points for 
the students to choose a therapeutic response 
from multiple-choice alternatives. The sub- 
jects’ choices and their responses to a subse- 
quent questionnaire provided the dependent 
variables. Of an apparently quite large num- 
ber of questionnaire items, only two produced 
significant differences, and both were opposite 
to prediction. As with AVOS and Bs with 
TAS “patients’—those with whom they 
would presumably be most effective—reported 
more discomfort and more difficulty in select- 
ing their helpful responses. Kemp’s “para- 
doxical discomfort” finding failed to replicate, 
however, when two methodological changes 
were introduced: (@) free response rather 
than multiple-choice alternatives, and (b) 
separate and clearly different recordings of 
actors portraying the two patient types 
Berzins & Seidman, 1968). While subjective 
iscomfort ratings did not differentiate be- 
en As and Bs, subjects in “compatible” 
pairings did report significantly more satis- 
faction with their helpful responses than 
those in “incompatible” pairings, as predicted. 

Another investigation of attitudinal reac- 
tions required medical students to evaluate 
“case histories” written to represent a schizoid 
and an intropunitive-neurotic patient (Kemp 
& Sherman*). Contrary to predictions, 
“compatible” therapist-patient pairings were 
characterized by less interest in treating the 
patient, less confidence in treatment outcome, 
feelings that the patient was less like an ideal 
patient, and anticipation of more difficulty in 
determining etiology. These findings were not 
replicated, however, when psychiatric resi- 
dents served as subjects; instead, As rated 
both patient stimuli more negatively than did 
Bs, and “compatible” pairings produced 
higher levels of therapist discomfort (Kemp 
& Carson, 1967). 

Tn a study by Sandler (1965), A and B 
college subjects played two-person-non-zero- 
sum games with partners whose patient-type 

"Kemp, D. E, & Sherman, D. Relationships be- 
tween medical students’ A-B Scale scores and their 
evaluation of schizoid and neurotic patients. Paper 
presented at the meeting of the Southwestern Psy- 
chological Association, Oklahoma City, April 1965. 


THE A-B THERAPIST 27 


characteristics were experimentally and inde- 
pendently varied by means of contrived AVOS 
or TAS pregame self-descriptions and by pre- 
programming of actual game behaviors as 
either suspicious or trusting. In terms of 
contrived pregame descriptions, “compatible” 
pairings produced less favorable reactions to 
the experiment, contrary to prediction. On 
the other hand, As with suspiciously play- 
ing partners and Bs with trustfully playing 
partners reported expectations of reciprocal 


cooperation and tended to perceive this 
expectation as being realized. 
Berzins, Seidman, and Welch (1970) 


tested the hypothesis that As and Bs would 
respond more effectively to cues associated 
with extrapunitive and intropunitive modes 
of handling anger, respectively. Male college 
subjects wrote self-selected “helpful” re- 
sponses to brief, contrived, tape-recorded 
communications and later made a number 
of evaluative judgments. As 
“compatible” pairings produced more satis- 
faction with helping performance than op- 
posite pairings, particularly when the “thera- 
pists” perceived themselves as less similar 
to the "patients" on the anger expression 
dichotomy. In addition, the intrapunitive 
"patient" was rated as significantly more 
likable than the extrapunitive, the likability 
ratings of the As for the intropunitive being 
the main contributor to this difference. 

Seidman (1969) asked male college sub- 
jects to respond verbally and helpfully at four 
points of interruption to videotaped enact- 
ments of “neurotic” and “schizoid” proto- 
types. Although the major focus of the study 
was the subjects! behavior, Seidman also 
reported a failure to replicate the findings of 
Berzins and Seidman (1968) and Berzins 
et al. (1970) on the subjects! subjective 
reactions. 

Scott (1968) assigned outpatients classified 
às neurotics to medical students undergoing 
brief, mandatory psychiatric training. Fol- 
lowing an initial interview, the “therapists” 
completed a patient evaluation questionnaire 
and independent raters assessed segments of 
the tape-recorded sessions for behavioral 
responses. No relationships were found be- 
tween A-B status and any of the attitudinal 


predicted, . 


or behavioral dependent variables. Since pa- 
tient type was not varied, this study provides 
no information on possible interactive effects. 
This negative finding is noteworthy because 
the medical students can be viewed as more 
similar to psychotherapists than college stu- 
dents, and the study focused on patient be- 
haviors obtained in a naturalistic treatment 
context, 


Behavioral Responses 


The investigation of behavioral response 
correlates of the A-B dimension has also in- 
volved a variety of analogue designs. Carson, 
Harden, and Shows (1964) reported the re- 
sults of two experiments, the first requiring 
college males to respond “helpfully” in 
writing to letters ostensibly representative 
of AVOS and TAS symptomatology. Judges 
rated the subjects’ replies on six content di- 
mensions, one of which (“interpretative and 
depth-oriented”) showed a significant inter- 
action for “compatible” pairings. The second 
experiment utilized similarly categorized A and 
B subject-therapists, but in actual interviews 
with male college “patients” in whom sets of 
neurotic or schizoid modes of interaction were 
presumably induced by means of preinterview 
instructions. As predicted, As with “schizoid” 
and Bs with “neurotic” interviewees ex- 
plored a wider range of content areas in an 
information-gathering interview than did sub- 
jects in “incompatible” pairings. This last 
finding was not replicated, however, when 
seemingly slight modifications in the design 
were introduced (Jacob & Levine, 1968). 

Extrapolating from the research on cogni- 
tive style and communication theory, Trov- 
land (1968) reasoned that, given an oppor- 
tunity to express their positive or negative 
evaluations of another person to that person, 
As should do so more unambiguously than 
Bs due to the assumed greater sensitivity of 
As to social cues. Male college subjects re- 
sponded to a “patient,” who behaved in either 
hostile or friendly fashion in an individual 
interview. The hypothesis was supported in 
the dislike-hostile but not in the like-friendly 
condition. 

Berzins and Seidman (1969) reported the 
results of another portion of the study de- 
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scribed earlier (Berzins & Seidman, 1968). 
“Compatible” pairings produced, on the part 
of the student-therapist, significantly longer 
responses, more declarative as opposed to 
interrogative remarks, and more positive 
social-emotional reactions. However, the eí- 
fects were not independent of the order of 
presentation of “patient” stimuli; when ex- 
posed to the *other patients," subjects con- 
tinued to respond in much the same manner 
as they had to the first.® 

Several of the studies described herein ex- 
amined behavioral as well as subjective re- 
actions. Thus, Seidman (1969) found that 
As and Bs in “compatible” pairings, relative 
to the opposite pairings, were characterized 
by significantly higher levels of rated em- 
pathic understanding and respect, in the 
client-centered therapy sense, and by verbal 
responses of longer duration. However, Scott’s 
(1968) A and B medical students did not 
differ on rated levels of empathy, warmth, 
and genuineness in an initial interview with 
neurotic outpatients. Kemp (1966) found 
no significant differences on four attributes of 
therapeutic intervention, including “warmth- 
acceptance.” Sandler (1965), in his non-zero- 
sum game task, reported that As and Bs in 
“compatible” pairings were suspicious, un- 
trustworthy, and competitive in their own 
game behavior. This effect was demonstrated 
only with partners who described themselves 
in either schizoid or neurotic terms, however, 
and was independent of the partner’s actual 
game behavior. The Berzins et al. (1970) 
study produced no statistically significant 
support for the hypothesis that As and Bs 
respond in a differentially effective manner to 
simulated extrapunitive and intrapunitive 
communications. That is, the previously re- 
ported “therapist-patient” interaction with 
respect to length of response and declarative 
versus interrogative form (Berzins & Seidman, 
1969) was not confirmed in this study. Fur- 
thermore, all subjects, regardless of A-B 
status, responded to the intrapunitive stimulus 
materials with significantly greater warmth, 
encouragement of independence, and urging 
for behavior change than they did to the 
extrapunitive stimuli, 


9 J. I. Berzins, personal communication, February 
13, 1969. 
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Cognitive Style, Personality, and Perception 
by Patient 


Differences in cognitive style and presum- 
ably related personality attributes have been 
the explicit focus of two analogue studies. 
Shows and Carson (1966), using the familiar 
male college subjects, obtained results on the 
Witkin RFT measure of field dependence- 
independence which were generally consistent. 
with those of Pollack and Kiev (1963). 
Whereas the latter researchers were concerned 
with explaining the relative effectiveness of 
A therapists with schizophrenics, Shows and 
Carson interpreted their findings more broadly 
so as to include the possibility that patients 
may also differ in psychological differentia- 
tion, in which case contrasting therapist and 
patient cognitive style may lead to ineffective 
communication and understanding in therapy. 
However, the Embedded Figures Test of 
field dependence-independence did not dif- 
ferentiate between As and Bs, hence one may 
agai . question the meaning of differential 
performance on the RFT.* 

Dublin, Elton, and Berzins (1969) per- 
formed a discriminant analysis on the re- 
sponses of A-B undergraduates to a personal- 
ity inventory and an aptitude-test battery. 
As predicted, As as opposed to Bs were 
characterized by a composite of higher femi- 
ninity and verbal aptitude but lower natural 
science aptitude scores. These results were 
seen as consistent with previous research on 
the A-B variable, including the speculations 
of Campbell, Stevens, Uhlenhuth, and Johans- 
son (1968) concerning the importance of an 
intellectual-verbal versus practical-mechanical 
difference in A-B therapist skills. 

Only one study was found which addressed 
itself to the question of patient perception of 
the therapist. Regardless of induced set (dis- 
trustful or trustful) in the interviewees, when 
asked to describe A and B student inter- 


* Drawing on a variety of laboratory perception 
studies, Silverman (1967) offered an intriguing but 
highly speculative analysis of personality trait and 
perceptual style differences as they might relate to 
the relationships established with schizophrenics, by 
A and B therapists. Unlike the present review, 
Silverman assumed that the available data On dif- 
ferential RFT performance reflect true differences 
between therapists in perceptual organization. 
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viewers, the Bs were seen as relatively warmer 
and more gentle (Carson & Harden °). 


Critique 


The lack of consistent evidence would 
appear to preclude interpretation of the pos- 
sible role of attitudinal reactions in mediating 
the assumed differential effectiveness of A and 
B therapist types with patient types. If one 
dichotomizes the data into those dealing with 
subjects’ subjective reactions (a) to his own 
“helping” behavior, and (5) to the “patient,” 
we find that in the former case there are two 
studies (Berzins & Seidman, 1968; Berzins 
et al, 1970) which support the notion of 
"compatible" therapist-patient pairings, one 
(Seidman, 1969) which found no differences, 
and one (Kemp, 1966) which produced di- 
rectly contradictory results. With regard to 
attitudes toward the “patient,” three studies 
(Kemp, 1966; Kemp & Carson, 1967; Kemp 
& Sherman, see Footnote 5) reported negative 
subject attitudes in “compatible” pairings, 
one (Berzins & Seidman, 1968) found no 
interactive effects, another (Sandler, 1965) 
produced both confirmatory and contradictory 
evidence depending upon mode of stimulus 
presentation, while two others suggested both 
patient-type (Berzins et al, 1970) and 
therapist-type (Kemp & Carson, 1967) main 
effects. 

A similar picture emerges from the data 
on differential behavioral responses. There 
are suggestions that "compatible" pairings 
may be characterized by therapist activity 
that is interpretative-declarative, wider rang- 
ing, more clearly communicated, of longer 
duration, and more accepting and respectful. 
However, these findings are based on single 
studies. Attempts to replicate the declarative 
nature of the "therapists" statements have 
either failed (Berzins et al, 1970) or have 
been confounded by order effects (Berzins & 
Seidman, 1969), and the same is true for 
length of response. Differential range of 
verbal exploration was not found by Jacob 
and Levine (1968), and clarity of attitude 


8 Carson, R. C, & Harden, J. A. The A-B thera- 
pist “type” distinction and behavior in an interview 
situation. Paper presented at the meeting of the 
American Psychological] Association, Los Angeles, 
September 1964. 


expression awaits further research con- 
firmation. “Therapist-offered conditions” of 
warmth, acceptance, etc., as an interactive 
effect was similarly found only in one study 
(Seidman, 1969), and not in three others 
(Berzins et al., 1970; Kemp, 1966; Sandler, 
1965). 

It would seem that methodological vari- 
ables play a major role in determining the 
results one obtains in studies of this sort. 
A broader question is whether the “treat- 
ment” behavior under examination in these 
studies has anything to do with actual thera- 
pist behavior in psychotherapy situations. 
Typically, the student subjects were not faced 
with another person with whom they could 
relate on a one-to-one, sequential, response- 
contingent basis, but instead listened to or 
read (often in groups) productions prepared 
by the researchers. In only one case (Scott, 
1968) did the stimuli emanate from actual 
patients; instead, subjects were usually pre- 
sented with verbal or written descriptions, 
or actors role playing a fictitious patient, 
sometimes unwittingly, on tape or in person, 
In the studies which employed prerecorded 
“psychiatric interviews,” not only the 

“patient” but also the “therapist” was an 
actor-confederate. Furthermore, subjects often 
responded in writing, at predetermined points 
of interruption and/or postexperimentally, 
either freely or to a questionnaire, usually 
under the constraints of brief time limits. 
It is submitted that a more fruitful strategy 
for the study of differential interpersonal 
response might involve the observation of 
properly selected subjects in actual interaction 
with bona fide clinical patients. Careful 
planning should render damage to either par- 
ticipant a distinctly unlikely event. Short of 
this, it seems evident that researchers in the 
area might well attempt to approximate 
natural conditions more closely than has been 
done in the past. 

Of still greater import is the likelihood that 
the studies in this section, even if inter- 
preted more optimistically, could reveal little 
or nothing about actual psychotherapists. In 
all of these studies, students were selected 
solely on the basis of extreme scores on an 
A-B scale, derived wholly or in substantial 
part from items on the SVIB Lawyer, CPA, 
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"Teacher Scales, which differentiated the White- 
horn and Betz therapists. However, the 
Whitehorn and Betz subjects were profes- 
sional psychotherapists, all of whom also 
scored high on the Physician, Psychologist, 
and Public Administrator Scales (Betz, 1962, 
p. 46). There is, therefore, no obvious reason 
to assume that college students who differ on 
the first group of scales will necessarily be 
analogous in potentially significant ways to 
actual therapists unless they are also similar 
on the last group. One hint that this may be 
an important oversight is supplied by Cohen's 
(1967) finding of several significant differ- 
ences between professional and "lay" thera- 
pists on expectations of psychotherapy process. 
Thus, the writer tends to share Cartwright’s 
(1968) lack of confidence that students who 
are untrained and uncommitted to the life role 
of a psychotherapist, whose value and interest 
similarity to those engaged in this work has 
not been established, can provide us with 
valuable information about A and B therapist 
differences in psychotherapeutic behavior. 
Related to the failure to control for interest 
and value patterns are questions concerning 
the lack of a standardized A-B scale. As 
noted earlier, several different versions of 
A-B scales were used in the original clinical 
investigations. Variations included scores 
on the SVIB Lawyer, CPA, Printer, and 
Mathematics-Physical Science Teacher scales 
(Betz, 1958), on only two of the foregoing 
(Betz, 19063), on 10 of 23 SVIB items 
(Whitehorn & Betz, 1960), and on all 23 
SVIB items (NcNair et al., 1962). Although 
the 10-item version was cross-validated on a 
small sample, it was found to be unreliable 
for another sample by McNair et al. (1962), 
who proposed more internally consistent 13- 
item and, later, 15-item scales (Lorr & 
McNair, 1966). The research reviewed in this 
and the preceding section has compounded 
the problem. Investigators have variously em- 
ployed one or another of the 13-, 15-, or 23- 
item scales, slight adaptations of these, or 
Kemp's (1966) scale of the original 23 SVIB 
items plus 8 MMPI items. Quite clearly, dif- 
ferent scoring systems can give conflicting 
results on the same person, and indeed, such 
findings have been reported for Phipps Clinic 
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(Stephens & Astrup, 1965), and other (Stoler, 
1966) psychiatric stafís. Standardization of 
the scale, therefore, remains an important 
but as yet neglected research effort. Further- 
more, in view of the significant proportion of 
females engaged in therapeutic activities, one 
may well question the utility of a scale that 
classifies nearly all of them as As on the 
basis of their lack of interest in mechanical- 
skill activities (Lorr & McNair, 1966). 
Attention may also be drawn to the com- 
mon use, in the laboratory research, of 
"patient" stimuli chosen to represent the 
Phillips and Rabinovitch (1958) AVOS and 
TAS symptom clusters, a practice apparently 
initiated by Kemp (1963, 1966). Employ- 
ment of this strategy requires the assumption 
that the relevant aspects of schizophrenia and 
neurosis can be translated with minimal dis- 
tortion into social withdrawal and intrapuni- 
tive syndromes, respectively. This may have 
been a premature simplification (Bordin, 
1965). As Buss (1966) pointed out, accept- 
ance of the AVOS-TAS system, which at- 
tempts to classify symptoms on the basis of 
their implied social maturity level, depends 
upon how closely it fits what is known about 
development and whether it can be estab- 
lished that psychopathology represents regres- 
sion to earlier developmental modes. The 
evidence to date suggests that social maturity 
is only one of many proposed dimensions of 
psychopathology, the relative importance of 
which remains undetermined. Therefore, there 
is no compelling reason to believe that the 
laboratory studies which rest upon the AVOS- 
TAS assumption are necessarily related to the 
features of psychological dysfunction sug- 
gested by the original clinical investigations. 
The search for personality correlates has 
also added little to our understanding of the 
A-B dimension. The general absence of per- 
sonality scale findings in the study of Dublin 
et al. (1969) suggests, as it did to the au- 
thors, that whatever response dispositions as 
may be differentially characteristic of A and 
B therapists may be situation-specific. This 
possibility has recently been recognized by 
Betz (1967), who, in contrast to her earlier 
position, now proposes that it is the “fit” 
between therapist and patient, and not the 
therapist’s personality alone, that determines 
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the therapists mode of therapeutic behavior. 
Thus, the content of the A-B scale (relative 
interest in mechanical activities) continues 
to defy conceptualization in terms seemingly 
relevant to psychotherapy. It would appear, 
therefore, that the future of A-B research 
may lie in the investigation of behavioral re- 
sponse differences, such as verbal and non- 
verbal communicative styles in dyadic inter- 
actions, rather than in the pursuit of broad 
personality trait differences. 

Finally, it may be noted that the present 
review has not focused upon questions of 
internal validity, for instance, the provision 
and adequacy of control groups, confounding 
of manipulations, etc. Thus, the studies fail 
to provide consistent evidence even if the 
results are taken at face value. 


CONCLUSIONS AND IMPLICATIONS 


Tt is apparent that the A-B therapist vari- 
able has proven to be an elusive phenomenon 
to study. On the assumption that it is elusive 
and not illusory, the foregoing analysis has 
been directed to the identification of con- 
ceptual and methodological limitations that 
have hindered the development of research in 
the area. It has been argued that the bulk 
of the research has been based on a number 
of tenuous assumptions which clearly violate 
guidelines for analogue research on psycho- 
therapy. Bordin (1965), for example, offered 
three rules of simplification. In effect, these 
rules caution the laboratory researcher (a) to 
establish conditions as nearly equivalent to 
those in the natural setting as possible, which 
requires (b) a great deal of knowledge about 
the natural phenomena prior to simplifica- 
tion, or, in the absence of such knowledge, 
(c) the early establishing of empirical bridges 
between the simplification and the naturalistic 
phenomena. 

The foregoing review indicates that no such 
principles have guided the efforts of investi- 
gators of the A-B variable. The laboratory 
research has suffered most clearly from the 
premature inference that interactional effects 
do, in fact, obtain under natural psycho- 
therapy conditions. However, the clinical evi- 
dence for the interaction hypothesis was 
shown to be weak, at best, and the analogue 
research has contributed little in the way 


of additional support or clarification. Tt 
seems likely that the simplifications of using 
college student “therapists” and artificial pa- 
tient stimuli rendered the A-B variable more 
amenable to investigation than might other- 
wise have been the case. That such investiga- 
tions have not proved fruitful illustrates a 
pitfall subtly suggested by Levy (1961), who 
noted that research interest and activity is 
often controlled to a greater extent by tech- 
niques rather than problems. 

The present analysis of how the research 
became misdirected leads to the question of 
what can be done about it. It would appear 
evident that a necessary prerequisite to 
further laboratory research is an adequate 
demonstration of A-B therapist-type patient- 
type interaction effects in a natural psycho- 
therapy setting. The interaction hypothesis 
must be evidenced, in the same clinical study 
employing multiple criteria and both “types” 
of therapist and patient, before we can rea- 
sonably expect simulated research to ad- 
vance our knowledge of the A-B dimension. 
Should this critical study produce the pre- 
dicted interactional effects, it would then be 
urged that subsequent laboratory research 
take heed of the methodological suggestions 
outlined herein. 

Finally, it should be noted that recent 
changes in hospital situations have important 
implications for the critical studv herein pro- 
posed. The original Phipps Clinic research 
was conducted entirely with pre-1951 thera- 
pist and patient samples, when improvement 
rates for schizophrenics approximated 50%, 
sufficient numbers of therapists were classified 
as As or Bs by the SVIB, and “psycho- 
therapy only” was a common treatment strat- 
egy. Since that time, however, average annual 
improvement rates have exceeded 70%, rela- 
tively few therapists have scored as Bs on 
the SVIB, and ataractic drugs have become 
an integral part of the Phipps’ treatment 
program (Betz, 1967). No ready explanation 
for the apparent influx of As suggests itself, 
and it is assumed that sufficient numbers of 
the Bs are generally available for further 
research, Furthermore, it seems unlikely that 
the proportionate increase in A therapists is 
causally related to the elevated success rates 
at the Phipps Clinic, since similar improve- 
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ment rates and drug usage are now typical 
nationally, undóubtedly reflecting the symp- 
tom-alleviating function of the ataractics 
which renders the patient more accessible to 
psychotherapy (Coleman, 1964). 

This being the case, and noting that the 
present 70% improvement rate is comparable 
to that attained by A therapists in *pscho- 
therapy only" in the Phipps Clinic studies, 
at least two speculations on the effects of 
drug treatment on A-B success rates are 
possible. First, the ataractics may operate in 
such a way as to render insignificant the 
special qualities of a given therapist, result- 
ing in high improvement rates regardless of 
A-B status, by virtue of the symptom- 
reducing function of the drugs. The pre- 
eminence of this possibility is suggested by 
Betz's (1962) reference to the clinical style 
of Bs, but not As, as including a focus on 
the modification of the patient's symptoma- 
tology. A second hypothesis is that As and 
Bs both achieve greater success with drug- 
treated than with non-drug-treated patients, 
but the discrepancy between their success rates 
remains of similar magnitude as before. It is 
this latter possibility which should be con- 
vincingly demonstrated in a clinical study as 
a prerequisite to the proposed test of the 
interaction hypothesis, as otherwise the issue 
becomes meaningless. That is to say, there 
is little utility in implementing the necessarily 
complex critical study of the interaction 
hypothesis unless it can be shown that the 
phenomenon still exists under modern tech- 
niques of schizophrenic treatment, which now 
result in improvement rates quite comparable 
to those achieved earlier by A therapists in 
"psychotherapy only." It is entirely possible 
that, with the introduction of ataractic drugs 
and other innovations, the key to the A-B 
dimension as a relevant therapist variable 
has been lost. 
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ERRATA 


In the article "Some Properties of Ipsative, Normative, and Forced-Choice 
Normative Measures" by Lou E. Hicks in the September 1970 issue, the last line 
of the second paragraph of the left-hand column of page 169 should read as follows: 
“All of these measures yield scores which are originally obtained possessing some 
ipsative properties; none yield transformed normative scores." 
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Three general issues of dimensionality, validity, and content of ratings of 
managerial performance are examined in terms of “relevance to the ultimate 
criterion.” It is argued that the multitrait-multimethod design yields the best 
evidence for investigating these issues. An analysis of variance model with 
computational formulas for the sums of squares and variance components based 
on the correlation matrix is proposed for the multitrait-multimethod situation. 
Using this model, data from two studies are examined and compared, relative 
to convergent and discriminant validity, method bias (halo), and error vari- 
ance. It appears that this model and the indexes derived from it provide a 
more simplified and interpretable technique for a 


multitrait-multimethod data. In addition, 
ratings based on partitioning the original 


from the larger data set. 


One of the most persistent difficulties in 
industrial psychological research has been 
labeled the “criterion problem.” Although 
this has been a catchall phrase for any mis- 
behavior of criterion measures, the major 
issue seems to be whether the specific “here- 
and-now” measures truly represent job per- 
formance in some ultimate sense. With regard 
to judgmental ratings of job performance, this 
problem is further complicated by having 
first to decide how many traits are to be 
rated, and second, which ones are most 
indicative of job performance. The former 
issue is one of dimensionality of job perform- 
ance, while the latter is one of validity. The 
two are related of course. 

Tt seems clear, as Dunnette (1963a, 1963b) 
has argued, that more than one dimension 
should be used. The exact number to include 
is not established, however, and seems to rest 
in part on the purposes of the research and 
the biases of the investigator. The problem 
then becomes one of scientific and practical 
efficiency. Practical efficiency involves cost 
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nalyzing and summarizing 
a procedure to improve performance 
matrix is illustrated with an example 


and epus while scientific efficiency in- 
volves achieving maximum information with 
the smallest number of traits. There is a 
cutoff point at which an increase in the 
number of rating dimensions does not sig- 
nificantly add to the variance accounted for. 

Theoretically, the general problem of the 
validity of criteria first received thorough dis- 
cussion by Thorndike (1949) in terms of 
"revelance" of criterion measures. He stated: 


relevance to the ultimate goal is the prime essential 
of a criterion measure. A criterion measure is rele- 
vant as far as the knowledges, skills, and basic apti- 
tudes required for success on it (the measure) are 


the same as those required for performance on the 
ultimate task [p. 125] 


Thus, in order to determine the relevance 
of a criterion, one must have some notion of 
what is the ultimate goal or criterion. Vari- 
ous writers have offered their conceptualiza- 
tions of this ultimate goal and its relation 
to the more immediate measures of job per- 
formance. Thorndike (1949) divided criteria 
into three categories: immediate, intermedi- 
ate, and ultimate. Immediate and intermediate 
criteria are the ones that are present, usable, 
and in most cases, measurable, for instance, 
production records, gross sales, performance 
ratings; whereas the ultimate criterion is the 
final goal of an organization and thus, is 
usually unavailable, Analogous conceptualiza- 
tions have been presented in terms of ultimate 
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criteria, penultimate criteria, and measures of 
current organizational functioning (Seashore, 
1964), and conceptual criteria versus criterion 
performance (Astin, 1964). The different 
labels used among these classifications appear 
mainly to represent differences in orientations. 
However, the notion that the more immediate 
measures must be relevant to a more final 
goal—a more abstract, higher order criterion 
not usually amenable to measurement—is 
treated consistently by the three authors 
cited.* 

Based on the ideas previously discussed, 
one can conclude that the ultimate criterion 
can best be described as a psychological con- 
struct. Thus, the process of determining the 
relevance of the immediate to the ultimate 
criterion becomes one of construct validation. 
That is, the assessment of the relevance of 
our measures of job performance involves 
determining the “meaning of the measure.” 
In order to make a decision cognizant of 
scientific efficiency, as noted earlier, it is 
thus necessary to ascertain whether the 
ratings of personal and/or job-related traits 
of an individual reflect his job performance, 
that is, the construct validity of the ratings. 

The establishment of construct validity 
has received considerable discussion (e.g., 
American Psychological Association, 1966; 
Bechtoldt, 1959; Cronbach & Meehl, 1955; 
Loevinger, 1957; Peak, 1953; Royce, 1963), 
but certainly one of the best procedures avail- 
able for operationally assessing construct 
validity is the multitrait-multimethod scheme 
(Campbell & Fiske, 1959). Using this ap- 
proach, it is possible to demonstrate empiri- 
cally the convergent and discriminant validity 
of the ratings. If we obtain performance-trait 
ratings on an individual from raters at dif- 
ferent organizational levels (e.g., superiors 
and subordinates), then the amount of agree- 
ment between raters on the same traits 
(convergent validity) and different traits 


*It should be emphasized that we are speaking 
io the issue of relevance to an ultimate criterion of 
supervisory performance, not necessarily the ultimate 
criterion of organizational functioning. Although 
organizational effectiveness will depend in part on 
supervisory effectiveness, the specification of rele- 
vance of the ratings refers to an abstract notion of 
managerial behavior that leads to successful job 


performance. 


(discriminant validity) can be determined. 
That is, with the multirater approach, one 
can obtain judgments of the ratee’s behavior 
from different and relatively independent per- 
spectives in the organization. With this in- 
formation, the investigator can decide (using 
operationally defined criteria for construct 
validity) whether he has, in fact, captured 
significant job-performance variance based on 
the different raters’? agreement and/or dis- 
agreement of the stimulus person’s on-the-job 
behavior. 

Campbell and Fiske (1959) argued that 
evidence for convergent validity exists when 
the correlations between raters on the same 
traits (the validity diagonal entries) are sig- 
nificantly different from zero. Evidence for 
discriminant validity is threefold. First, the 
correlations in the validity diagonal should 
be higher than those in the same column and 
row in which neither trait nor rater are in 
common. Second, the values in the validity 
diagonal should be higher than those cor- 
relations between that trait and other traits 
with a common rater. Third, the pattern of 
trait interrelationships should be the same 
within and between raters. 

Several applications of the multitrait-multi- 
method scheme of analysis have been noted. 
Campbell and Fiske (1959) cited some early 
studies amenable to a  multitrait-multi- 
method analysis. Research since then has 
shown applications of the approach to such 
diverse areas as personality traits (Mason, 
1963), creative ability (Goodman, Furcon, 
& Rose, 1969), job satisfaction (Evans, 1969; 
Locke, Smith, Kendall, Hulin, & Miller, 
1964); need satisfaction (Alderfer, 1967), 
job evaluation (McCormick, 1964), and per- 
formance ratings (Lawler, 1967; Reilly, 
1966; Tucker, Cline, & Schmitt, 1967). 

However, only one of the studies cited 
(Lawler, 1967) dealt specifically with ratings 
of managerial performance, the focus of the 
present paper. Superior, peer, and self-ratings 
of three traits were obtained on middle- and 
top-level managers. The superior and peer 
ratings show good convergent and discrimi- 
nant validity, but the self-ratings show little 


*' These studies are not listed here since they are 
available in the original article. 
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of either. This result is not surprising in 
light of the act-action dichotomy of meaning 
(Kaplan, 1964) which is analogous to a 
phenomenological-behavioral distinction. A 
articular bit of behavior may have a dif- 
erent meaning for the actor (act meaning) 
than for the observer (action meaning). This 
leads the present writers to question the 
value of including selí-ratings in multitrait- 
multimethod analysis. It does not mean that 
self-ratings have no validity, but rather that 
they appear more relevant for training and 
development than for evaluation per se. 

It seems that a better additional source 
of ratings for use in a multitrait-multimethod 
analysis would be the subordinate since 
another organizational level would be in- 
cluded. Lawler (1967) noted that the em- 
pirical question of how subordinates’ ratings 
are related to other sources has not been 
answered. A partial answer is provided in a 
subsequent discussion. 

Nevertheless, once the investigator is sure 
he has captured a significant portion of job- 
performance variance, he still faces what can 
best be described as the issue of the relative 
representativeness of traits. This problem 
has been generically labeled the "content 
issue" in rating scales. Prior to the implemen- 
tation of a rating program for industrial 
performance appraisal and/or research, a 
judgment must be made concerning the con- 
tent or stimuli to be rated. This content is 
best conceptualized as extending along a 
continuum ranging from subjective to objec- 
tive, abstract to concrete, or personality to 
performance. The generally accepted view- 
point is that overt or performance-oriented 
behaviors are more relevant to job perform- 
ance than the covert personal ones, although 
this is not reflected in actual use of ratings 
where ratings of highly abstract personality 
traits still linger tenaciously (National Indus- 
trial Conference Board, 1964). 

Various writers (Barrett, 1966; Bittner, 
1948; Ghiselli & Brown, 1955; Odiorne, 
1963; Schultz & Siegel, 1961: Symonds, 
1931) have concluded that the proper content 
for rating scales are job-related objective 
traits. A typical summary is presented by 
Barrett (1966): 


The major attention is best directed to the product 
of a man’s effort, and whenever possible he should 
be studied in terms of what he accomplishes. But 
when products are inaccessible, performance is sug- 
gested as being the next-best level of abstraction 
to deal with, while pure personality variables have 
little if any relevance to the performance measure- 
ment task [pp. 38-39; italics added]. 


Following this logic, it would appear that 
personality dimensions should account for 
little of the variance in job performance.* 
The senior author of this paper has completed 
a review of the literature (unpublished ) 
relevant to this question dating from 1920 
on and has discovered three basic kinds of 
empirical evidence: (a) interrater and re- 
rating reliability, (b) validation against an- 
other criterion, and (c) multitrait-multi- 
method research. The conclusions of this 
review were (a) the content issue appears to 
be far from settled; (5) the multitrait- 
multimethod analysis appears to provide the 
only sufficient evidence for relevance of 
rating scale content; and (c) one should 
use any trait in ing form if it helps 
account for a signific ortion of the job 
performance variance. it is questionable 
whether the empirical evidence to date would 
support the view that one should abandon 
the evaluation of an employee’s personality 
traits because they lack relevance for per- 
formance appraisal. The central question is 
whether an employee’s personality affects his 
performance on the job, and consequently, 
whether the investigator can accurately as- 
sess and differentiate between effective and 
ineffective employees on these variables. That 
is, will one be able to account for more of 
the job performance variance in the ratings 
of managers by including personality traits: 
or should one employ a “critical incidents” 
technique that involves only performance 
dimensions. 

As before, the best way to assess this rela- 
tive representativeness of traits is with the 
multitrait-multimethod scheme. 
gator can judge thereby which traits possess 
more validity (are more relevant) for job 
performance, at least in terms of judgmental 


The investi- 


* Ot course, the difficulty in tating personality 
traits reliably has complicated the problem, and 
perhaps this difficulty has led to the tacit acceptance 
of the superiority of performance traits. i 


f 
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data, However, using Campbell and Fiske's 
(1959) four criteria for construct validity, it 
is quite difficult to make relative judgments. 
Thus, it is shown subsequently that it is 
possible to summarize the correlations in a 
multitrait-multimethod analysis in analysis 
of variance terms and estimate variance 
components. This method of summarization 
of the data appears to be more interpretable, 
in general, and useful particularly for the 
purpose of this report—relative comparisons. 

This paper asks three general questions about 
superiors’ and subordinates’ ratings of the job 
performance of middle managers. First, are 
they relevant—do they demonstrate construct 
validity? Second, how many dimensions 
(traits) are necessary to be rated in order to 
fulfill the requirements for scientific effi- 
ciency? Third, which type of rating dimen- 
sion—performance or personality—has greater 
relevance for managerial job performance? 
In short, what is the meaning, dimensional- 


TABLE 1 
MULTITRAIT-MULTIRATER MATRIX^ 
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Traits 
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Superior 
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ter triangles; solid lines 


a From Li inted from an article by E. E. 
Lawler publi ober 1967 Journal of Applied Psy- 
chology. Copyright by the American Psychological Association, 
Inc., 1967. 


ity, and content of ratings of middle manager 
performance? 


ANALYSIS OF VARIANCE TECHNIQUE FOR 
MULTITRAIT-MULTIMETHOD DATA 


The examination of multitrait-multimethod 
data using the four desiderata or criteria 
suggested by Campbell and Fiske (1959) is 
inferential, implicit, and in the case of large 
matrices, quite awkward. For example, exami- 
nation of one matrix in this paper (60 X 60; 
ie, 3 raters and 20 traits) using their cri- 
teria would be quite tedious. In addition, the 
problem of comparison of effects within and 
between studies is difficult, if not impossible. 
Clearly, a technique is needed to summarize 
the data in a more explicit, interpretable, and 
comparable form. 

Stanley (1961) has proposed that multi- 
trait-multimethod data can be analyzed by 
analysis of variance considering the situation 
a three-way factorial. Using average vari- 
ances and covariances, he has shown that 
the mean squares for the model can be esti- 
mated. Using analogous logic and derivations, 
Wolins? and Zyzanski (1962) have shown 
the mean squares can be estimated using 
averages of blocks of correlations in the 
multitrait-multimethod matrix. The computa- 
tional procedures using average correlations 
to estimate the mean squares and variance 
components of the analysis of variance model 
are contained in Boruch, Larkin, Wolins, and 
MacKinney (1970). Since this technique has 
not been used extensively, an example from 
the literature is presented (Lawler, 1967). 
Table 1 presents the matrix from this study. 

Considering managers as random and traits 
and sources (raters) as fixed, the following 
three-way classification model is hypothesized 
to describe the data: 


Y; = p + ai + B; Yr + (a8); 
+ (ay)i + (8) + Eijk 


6 Unpublished technical report entitled “A pro- 
cedure for estimating the amount of trait, method, 
and error variance attributable to a measure” (sup- 
ported by project C-998, Contract Nonr 3-20-001) 
Washington, D. C., Office of Education, U. S. De- 
partment of Health, Education, and Welfare, 1964 
(Donald T. Campbell, Principal Investigator.) 
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TABLE 2 


COMPUTATIONS FOR SUMS OF SQUARES AND SPECIFICATION OF EXPECTED 
MEAN SQUARES rog LAWLER 1967 DATA" 


Source | df 


| Sse Expected MS 
| u 7; P 2. 2 
3 M | Ni | Nam (Fo) oy + nm org 
Fg (t ) | (N — 1) (n — 1) | Num (ft — fo) cd may yr 
M X Source (S) | (N — 1) (m — 1) | Num (is — fo) veh -dnowWxs 
Error (E) (N — 1) (n — 1) (m — 1) | Nam (1 — kil — Pos + Fo) oy 


Note.—ro = average correlation of all of the ele 
correlation between sources within traits; computatio: 
řws = average correlation between traits with 
two plus um divided by m = number of n 

? Modified with permission from Boruch ct al. 


where 


Y ij = rating of manager for the traits by 
sources ; 
a; = effect of manager i = Lis ¥, 5 d18 
B; = effect of trait j = 1, 2.8 
Yr = effect of source $ = 1,2,3 
Eijk ~ NID(0,c2) 


Although it is possible to estimate all 
effects in the model, the psychologist is pri- 
marily interested in only four sources of vari- 
ance to provide the essential information in 
the multitrait-multirater situation. These are: 
(a) person (manager) variance Which indi- 
cates the overall amount of 
(convergent validity) on managers over 
Sources (raters) and traits; (b) manager by 
trait variance which indicates the degree of 
rated discriminations on traits by managers 
(discriminant validity); (c) manager by 
Source variance which indicates the amount 
of source bias—“halo” in the rating situation; 
and (d) error. The estimation and interpreta- 
tion of these effects and variance components 
is based on consideration of the sources of 
covariation in the multitrait-multirater situ- 
ation, As Stanley (1961) pointed out, con- 
sidering ratees as rows in a ratee-rater-trait 
matrix, the repeated judgments o 
makes possible three sources of c 
(a) within each rater across t 
within each trait across raters, and 
both raters and traits. 'Thus 
of variance model proposed 
data, one can investigate halo effects, dis- 
criminant validity, and person effects or 
convergent validity which correspond to the 
preceding three sources of covariation, 


agreement 


ver ratees 
ovariance: 
raits, (b) 
(c) across 
, in the analysis 
to describe the 


ts in the matrix, 
he sum of the 


including the ones in the main diagonals; rw! = average 
validity diagonals times two plus nm d ed by um*; 


m of the monomethod-heterotrait triangles times 
im = number of sources, 


The computations for the sums of squares 
of these four effects from the correlation 
matrix, the degrees of freedom, and the 
expected mean Squares are presented in 
Table 2. 

Using the correlations from Table 1, the 
analysis of variance in Table 3 was computed. 
Since it is of interest to compare the amount 
of variance due to each source in Table 3, 
variance components were also computed. 
The variance components allow one to make 
inferences about meaj l effects, par- 
ticularly relative to unexplained | variance 
(error), while controlling. the sample size. 
Estimates for the variance components were 
calculated (see Table 4). 

As indicated in Table 3, the results of the 
significance tests on the main effect and 
interactions indicate that each source of vari- 
ance is significant ( p < .001). These results 
can be interpreted in the following manner. 
There is differentiation among manager at- 
tributable to the measuring instrument used, 
that is, person variance or convergent valid- 


ity. However, the equally large Manager x 
X 


TABLE 3 


IS OF VARIANCE or Correr 
FROM TARLE | 


ANALYS LATIONS 


Source aj 


MS F Variance 
component 
Manager (M) 112 3.105 | 7.2 
er (M) -103 .204* 
aoe 24 | |l 886° 12 
M X Source l j 
(S) 224 1.312 
p M 3.044* 294 
Error 448 431 is E 
N Ee ee M 
*b « 001. e 
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Source effect indicates there is substantial 
method bias (*halo" by raters) confounding 
the first result. Further, although the Man- 
ager X Trait interaction indicates ordering of 
managers differently on different traits, it is 
the weakest effect—less than one-half the 
size of the other two. Thus, it appears there 
is some evidence for discriminant validity, 
but much better convergent validity; al- 
though this latter evidence must be tempered 
by the equally large source *bias." 

The relatively equal contribution of the 
evaluation of the managers and the “halo” to 
the ratings would seem to indicate that al- 
though there is good convergent validity in 
the matrix, it is method or source-bound. 
The weaker Trait X Manager effect would 
mean there is either less (relative to con- 
vergent) discriminant validity overall, or 
else it varies from good to poor across traits. 
Inspection of the original matrix (Table 1) 
reveals these judgments about convergent and 
discriminant validity to be accurate. Of 
course, the actual estimation of source bias 
is not possible from merely scanning the 
original matrix, but must be quantified as 
above. 

However, examination of these variance 
components in this manner does not tell the 
whole story. Note that although the F ratios 
are significant, they are not very large, and 
the degrees of freedom are rather large for 
the significance tests. Further, the magnitude 
of the error variance is larger than any of 
the effects, which means the ratings are more 
subject to unknown sources of variance than 
they are to the recognized sources, thus ques- 
tioning the practical significance of the effects. 


TABLE 4 


Source Variance component 


nn 
MSu x v — MSu x 7x8 


Manager (M) 


M X Trait (T) 

n 

M X Source (S) MSu xs — MSu x 1x8 
n 

Error MSuxtxs 


Note.—n = number of traits; m = number of sources. 


TABLE 5 


Direcr COMPUTATION OF VARIANCE COMPONENTS 


Source Variance component 


(Fes — òn) 
m 


Manager (M) 


M X Trait (T) 
M X Source (S) 
Error 


78 = average correlations of hetero-hetero triangles; 
average correlation of validty diagonals; Fws = average 
correlations of monomethod-heterotrait triangles; » = number 
of traits; m = number of sources (raters). 


However, it appears one could increase both 
convergent and discriminant validity by add- 
ing across raters in this situation. 

The interpretations made here based on 
this technique are not substantially different 
from those made by Lawler (1967), with the 
exception of the identification of the size of 
the source bias and error variance. In order 
to extract full information from the multi- 
trait-multimethod investigation for decision 
purposes, the researcher should consider the 
matrix of correlations and the variance com- 
ponents as complementary. The advantage of 
the analysis of variance technique are (a) it 
is a more efficient manner to summarize and 
interpret the evidence for discriminant and 
convergent validity, particularly if the matrix 
is fairly large; (5) the validity information 
is more explicit (less judgmental) and 
quantifiable by this method; (c) it allows 
the estimation of method bias and the amount 
of sampling variance in the research; and 
(d) the relative strength of the effects can 
be obtained. 


Some Refinements to the Analysis of Variance 
Technique 


Although the previous computational pro- 
cedure as described by Boruch et al. (1970) 
is fairly straightforward, the variance com- 
ponents for a given data can be estimated 
directly. These estimations are shown in 
Table 5.7 The advantage to these calcula- 


7It can be shown that the expected values of the 
average correlations in Table 5 are the respective 
variance components. However, the algebraic deriva- 
tions of these estimates are beyond the scope of 
this paper, but may be obtained from the third 
author. 
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TABLE 6 
MULTITRAIT-MULTIMETHOD INDEXES 


Source | 


Formula | Values 
VCs | "m 
Manager (M) ago | . 
E s ri VCu x Fl] 23 
M X Trait (T) VCu xc - VC; a 
VCu xs 
M X Source (S) MES | Al 


VCu x s + VCy 


| 


Note.—VC = variance component. " " 

^ The distribution of these intraclass correlations are approxi- 
mately the same as a Pearson product-moment coefücient. 
Thus, for large size samples, confidence intervals can be ob- 
tained by transforming the scale to Fisher sZ. 


tions is that the investigator can go directly 
from the correlation matrix to the variance 
components with less computations. 

It is important to realize that the estima- 
tion of the variance components provides 
evidence interpretable for within-study com- 
parisons, but not across studies. It is also 
of interest to compare across matrices rela- 
tive to convergent and discriminant validity. 
There are several cases where this would be 
desirable: (a) when the multitrait-multi- 
method design is replicated, for instance, a 
longitudinal study; (b) across studies by 
different investigators to make a practical 
determination about which set of methods 
and/or traits to use; and (c) to compare 
validity of methods and/or traits within one 
study, for instance, performance versus per- 
sonality traits, peer versus self-ratings. 

One method to do this is to compare the 
three variance components relative to their 
error variance. That is, one can derive 
three indexes by using the formula, true 
variance/true plus error variance, to indicate 
the amount of convergence, discrimination, 
and method bias in a matrix in a form ame- 
nable to intermatrix comparisons. Thus, for 
the Lawler (1967) data presented earlier, 
these indexes and computational formula are 
in Table 6. Obviously, relative to each other, 
these indexes provide the same information 
earlier discussed using variance components. 
Their usefulness is that they do provide 
indexes for comparison across matrices with 
differing error variance, 


An example of the use of these indexes, 
and the short-cut method for estimating 
variance components can be derived from the 
Lawler (1967) data. If the investigator 
wished to determine whether to use peer or 
self-ratings (although it is apparent in these 
data), the matrix in Table 1 could be pats 
titioned, say, for superior and peer ratings 
versus superior and self-ratings. Thus, there 
would be two 6 X 6 matrices in which vari- 
ance components could be estimated and the 
three indexes derived, These values are pre- 
sented in Table 7. Examination of the variance 
components across matrices is certainly more 
complex than using the derived indexes. Thus, 
on the basis of these indexes, the investi- 
gator and/or practitioner could clearly see 
the superiority of superior and peer ratings 
over self-ratings in terms of greater con- 
vergent validity and less source bias. 

* 


AN EXAMPLE OF ANALYSIS WITH A 
LARGER Data SET 

Method 

Although some of the conclusions relative 
to the Lawler (1967) data set using the 
analysis of variance technique were not sub- 
stantially different from those made using 
the conventional criteria (Campbell & Fiske, 
1959), the usefulness of the analysis of vari- 
ance technique is based in part on its effi- 
ciency in summarizing the available infor- 
mation in a multitrait-multimethod matrix. 
The investigator faced with a large matrix 


TABLE 7 
Comparison or RATING Sources 
-A d = 
Sonna Superior Superior 
and peer and self 
De 1 k F^. 
Variance components 
Manager (M) 504 264 
M X "Trait (T) 199 .109 
M X Source (S) .140 379 
Error .293 E 
Indexes 
= nd. 
M -63 36 
M X Trait A0 A9 
M X Source zl 44 
alles u A 
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would find it cumbersome and tedious (and 
subject to more subjective judgment error) 
to evaluate it using the conventional ap- 
proach. The advantages of this analysis of 
variance technique can best be illustrated 
by examining a larger data set—a 60 x 60 
matrix including 20 ratings from three dif- 
ferent respondents. 

These data represent a portion of the 
data collected for a longitudinal study of 
managerial performance described elsewhere 
(Kavanagh, MacKinney, & Wolins, 1970; 
MacKinney, 1967). The subjects were 658 
managers at three supervisory levels in 24 
different plant locations within the Owens- 
Illinois Company. The three types of man- 
agers in this study are plant managers (PM), 
department heads (DH), and foremen (FM). 
The “key level” of the study is the DH, 
operationally defined as a manager who 
supervises at least two FM, that is, he is a 
middle manager. The other manager levels 
are defined in terms of the “key level.” The 
PM is the immediate superior of the DH. 
In most cases, this is the manager of the 
total plant, but in some larger plants there 
are administrative managers to whom the 
DH reports. These latter persons were clas- 
sified PM for this research. The FM level 
consists of two subordinates of the DH, his 
most effective and least effective, designated 
FM-F and FM— in this study. The FM 
designated most and least. effective were 
selected by each DH from his group of fore- 
men. Tt was felt this selection of two foremen 
would provide a representative sample of the 
attitudes, behavior, etc., of the total FM 
group under each DH without collecting data 
from every foreman. 

Thus, the DH comprises the unit of analy- 
Sis. Each DH unit includes the DH, his 
superior (PM), and two subordinates (FM+ 
and FM—). Data were collected from 183 
such DH units. 

The data were collected by questionnaire 
at each plant by a plant coordinator, usually 
the Industrial Relations Director or his as- 
sistant. The total research variables were 
described in an unpublished paper by Mac- 


Kinney,’ but, for the purpose of this report, 
performance ratings of the DH on 20 traits 
were collected from the PM, FM+, and 
FM-. 

The ratings were divided into three major 
sections: (a) functions of the job, (5) 
subjects of the job, and (c) personal traits. 
The first two were derived from the un- 
published work of Dunnette and Kirchner? 
from segments of their *Management Ques- 
tionnaires” devised for research purposes 
within the Minnesota Mining and Manufac- 
turing Company. “Functions” include eight 
areas of responsibility: planning, investi- 
gating, coordinating, evaluating, supervising, 
staffing, negotiating, and representing. *Sub- 
jects” include six general categories: person- 
nel, finances, materials, markets, methods, and 
equipment. A detailed list of specific activi- 
ties accompanies each function and subject 
area describing and defining it in detail. 

The third section of the ratings instrument 
is based on the work of Chew and Howell 
(1960) and a subsequent unpublished factor 
analysis by Chew.!? This section assesses six 
constellations of personal traits of the ratee: 
intellectual capacities, human relations, con- 
cern for quality, leadership orientation, inde- 
pendence, and achievement orientation. As be- 
fore, a detailed description accompanies all 
stimuli. The six areas were named by Mac- 
Kinney (see Footnote 8) based on Chew’s 
analysis of ratings by middle managers of 
white-collar supervisors. It is obvious that the 
latter stimuli are strongly oriented toward the 
manager’s personality rather than his objec- 
tive performance on-the-job, as were the two 
rating sections discussed earlier. 


Results 


Total matrix. The 60 X 60 matrix repre- 
senting the ratings from three sources on the 
DH is presented in Table 8. Although one 


8 Unpublished paper entitled “The longitudinal 
study of manager performance: Phase I variables,” 
1967. 

9?Drs. Marvin D. Dunnette and Wayne K. 
Kirchner, personal communication, May 1966. The 
investigators are deeply indebted to Drs. Dunnette 
and Kirchner for making these materials available. 

10 The investigators would like to thank Dr. W. 
B. Chew for providing the unpublished information 
and for permission to use this information here, 
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TABLE 8 
INTERRELATIONS AMONG 
Raters PM 
Stimuli Jo psleisfofeTo po Topo [oTo]w Te eu [o] | 
1 
2 |79 
3 |69 e^. 
4 |68 59 
5 |o s 
6 |ó so 
7 |2 27 
8 20 25 
9 |s7 s8 
10 |45 48 
al 11 |60 58 
12 | 26 Ji 
13 |69 61 
14 |50 38 
15 |72 68 
16 |32 30 
17 |48 so 
18 |67 59 
19 |67 59 
20 |63 60 
1 @ a 
2 30 @) 
3 35 ag 
4 n 94 
5 22 24 
6 29 25 
7 28 22 
8 26 24 
9 2 j 
t w ais 7 
E n de 39 
12 20 19 
13. 34 30 
M4 27 2 
15 26 23 
16 24 18 
17 19 19 
18 33 26 
J9 a 38 
20 23 19 


can easily determine the amount of conver- 
gent validity Present, examination of this 
matrix using the three criteria for discrimi- 
nant validity suggested by Campbell and 
Fiske (1959) obviously would be a difficult 
task. 

Assuming the analysis of 


variance model, 
however, results in a 


simplified, interpretable 


analysis. The result: 
along with the estin 
ponents are present, 
of variance is significant ($ < 01). 
However, the sizes of the variance compo- 
nents lead to the following inferences. There 
appears to be good convergent validity (main 
effect of managers), but it must be noted 


S of the significance tests 
nates of the variance com- 
ed in Table 9, Each source 


MULTITRAIT-MULTIMETHOD ANALYSES OF RATINGS 43 


Ratincs or DH^ 


FM+ 


‘}2[s[e|sfoefr[e]o]w 


Jefe fs] u|as| we] a || w | 20 


that 
due to “halo” (Manager X Source). The 
rather weak discriminant validity present 
(Manager X Trait) indicates little discrimi- 
nation of managers on traits, a finding not 
surprising in light of the large source bias. 
Thus, even though there is good agreement 
across raters on traits, there is little discrimi- 


there is a larger effect on the ratings 


nation within traits which suggests reducing 
the number of rating dimensions, 

In addition, the size of the error variance 
component is approximately equal to the man- 
ager and Manager X Source effects, thus in- 
dicating the ratings are subject to unknown 
sources of variance about as much as they are 
subject to these effects. The relatively low 
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Table 8 


Raters 


PM 


-—PpppIm pres 


1 (4) 29 19 25 23 18 08 05 25 
2 29 (3) 14 16 19 17 02 06 20 
3 17 13 (T) 12 17 16 07 14 22 
4 27 26 19 @ 24 22 18 09 28 
S 27 21 20 29 @ 28 19 16 36 
6 28 19 14 22 19 @) o8 13 28 
7 15 11 01 10 11 07 [29] 14 10 
8 19 21 14 18 17 20 19 [e] 25 
9 27 25 22 27 25 21 17 1s 


FM- 
wv 
S 
w 
u 
Ei 
3 
a 

a 

= 

e 

$ 

i} 


13 14 09 21 07 30 17 11 21 24 13 
09 15 02 14 09 25 06 09 15 22 13 
09 06 02 05 OL 19 09 05 14 14 02 
u Bn 1014 101 27 15 342 347 22 09 
15. 14 11 20 18 24 20 13 28 26 08 


04 01 05 06 0 19 05 01 13 17 04 
mam 08 13 oF 16 19 03 12 13 Ql 
11 09 04 20 08 20 22 10 17 19 © 
@ 03 os 18 06 26 20 o8 12 18 n 
0s 08 12 11 22 09 14 16 22 09 
99 04 @ o6 o8 12 14 o 07 12 04 
Mo 15 03 @ 11 25 2 16 23 27 16 
94 08 02 iS @ i11 14 07 13 10 03 


magnitude of the Manager x Trait variance 
component in comparison with the other ef- 
fects, particularly error, means the differen- 
tiation of managers on traits is fairly poor. 
The small F ratio and the large degrees of 
freedom verify this inference leading to the 
conclusion that this effect may be of little 
practical importance. 

The three indexes listed in Table 9 allow 
a comparison with the previous study on man- 
agerial performance (Lawler, 1967). Refer- 
ring to Table 3, it is interesting to note the 
estimate of error variance is fairly close 
across studies. Comparing the indexes in 
Table 6 with those in Table 9, both matrices 
have about the same amount o 
validity, but the Lawler 
much better in terms of discriminant validity. 
The greater rater “halo” in this research 
(Table 9) complements the fact of less dis- 
criminant validity, that is, there is more carry- 
Over across ratings of different stimuli. 

Personality versus performance traits. As 
indicated earlier, the last 6 rating stimuli are 
oriented to the manager’s personality while 
the first 14 are oriented toward job perfo; 


f convergent 
(1967) data fares 


rm- 


ance dimensions. The total matrix was divided 
into two matrices, a 42 X 42 for the perform- 
ance trait ratings and an 18 X 18 for per- 
sonality traits. Table 10 presents the variance 
components estimated by the short-cut proce- 
dures ™ (Table 5) and the indexes of interest- 

Several points are immediately evident 
from Table 10. The ratings across personality 
traits contain considerably less sampling vari- 
ance (error) than any of the other matrices 
involving ratings of managerial performs 
The higher convergent validity (manager E: 
fect) and source bias (Manager X Source €t- 
fect) indicates there is more agreement among 
raters and more “halo” for personality traits 
than for performance traits. However, from 
the values of the Manager x Trait interac- 
tion, there is little difference between ratings 
of personality and performance stimuli i” 
terms of discriminant validity. Thus, the de- 
crease in error on the personality dimension? 

11 The analyses of variance following the Jonge 
computational technique were also calculated. The / 
ratios were all significant (b € 01), and the esti- 


mated variance components by this longer metho 
were within rounding error of those in Table 10. 


in d 


i 


(continued) 
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e 26 20 17 19 21 14 15 17 09 10 12 26 15 16 21 13 24 17 19 
18 (Z) 16 11 14 16 09 21 13 16 11 19 20 12 13 15 13 14 10 12 
12 19 05 11 10 14 14 10 11 03 06 iL 02 08 14 01 09 06 07 
24 31 22 63 23 24 23 18 21 16 21 20 30 23 20 25 14 26 20 22 
14 20 19 16 22 17 22 15 10 14 15 18 16 12 23 12 16 15 12 
14 19 15 09 14 09 11 11 04 10 1 18 16 12 22 04 17 11 13 
05 07 09 02 06 10 (3) 14 05 14 07 20 11 06 06 05 02 04 oi 0t 
12 17 18 14 13 18 16 e» 17 12 14 20 19 09 20 22 13 17 15 1 
16 21 24 16 20 21 19 17 [mi 07 12 11 22 17 12 27 10 20 17 14 
20 23 22 21 19 22 15 21 18 20 17 25 16 21 21 14 20 23 16 
14 21 19 17 21 18 13 19 18 18 A 21 20 21 15 21 17 18 16 16 
17 19 20 17 20 18 18 18 19 19 24 @ 21 14 15 18 11 15 18 17 
24 29 29 25 29 28 19 24 26 16 20 20 6» 25 19 30 17 28 26 27 
17 18 25 17 20 21 09 14 19 10 17 12 24 [217] 15 22 14 19 17 23 
15 20 21 12 15 21 18 23 19 09 06 12 20 15 23 15 16 18 17 
27 30 33 24 30 26 20 20 23 07 18 14 32 28 27 K) 19 32 28 21 
13 19 16 13 18 18 08 17 20 08 11 11 16 16 13 21 @ 19 17 17 
22 29 26 21 23 28 18 21 20 08 12 15 28 19 19 29 20 [15] 23 19 
15 21 17 09 13 13 iL 14 15 06 06 11 16 12 12 14 13 14 © 15 
07 13 14 03 09 08 05 07 09 01 02 04 07 08 06 12 07 11 08 ® 
Table 8 (continued) 
Raters FM- 


Stimuli fe f2fs]efsfefefe]e] fu] |] ufas| [a |e] | 2 


1 
2 
3 ` 
4 ja na 72^ 
s |69 63 66 75 
6 |71 68 61 70 68 
7 [ss s6 s2 52 51 OO NN 
s |e s6 s s 57 60 62 
9 |a 60 66 76 80 68 52 62^ 
y 10 feo 3] 40 SR 44 BS 53 65 des 
xz n [s 60 48 62 56 63 55 56 62 657. 
& (2 s2 47 42 52 4| 49 66 69 53 64 657. 
i3 178 65 62 75 67 74 50 60 76 58 66 61 Fs 
14 |62 57 48 65 56 65 SO St 67 St 64 S7 76 — 
15 [|60 62 65 66 65 62 56 59 66 50 S2 49 71 [32r 
16 |62 si 64 64 63 63 40 49 68 39 42 39 66 53 66 
17 |64 62 56 68 70 61 44 51 66 48 S9 45 73 62 " i 
18 |7s 62 64 71 76 66 49 56 77 S2 S% 43 76 63 80 75 76 
19 |67 6 62 64 66 62 51 S2 63 44 45 47 68 63 76 S7 73 TNS 
20 |e s 60 60 59 62 47 SO 9 42 49 40 66 59 74 52 72 68 80 


Note.—For N = 183, correlations of .15 and 
a Decimals omitted, negative correlations un 


validity diagonal. 


.19 are significant at the 5% and 1% levels, respectively. 
derlined, dotted lines for hetero-hetero triangles omitted; circled correlations = 


= € 


TABLE 9 
ANALYSIS OF VARIANCE: TOTAL Matrix 


Source | df 


Manager (M) | 182 
M X Trait (T) | 3458 
M X Source (8) | 364 

Error 6916 | 


a Refer to Table 6. 


did not lead to more differentiation of man- 
agers on traits, but rather, more bias within 
raters and more agreement between raters. 
Therefore, it appears that comparison be- 
tween the performance and personality traits 
gives no absolutely certain indication of the 
superiority of one over the other. However, 
the decision as to which ones to use should 


be made easier with the simplified information 
in Table 10. 


Discussion 


This research was aimed at providing in- 
formation relative to three general questions 
about ratings of managerial performance. The 
first question concerns the validity of the 
ratings, that is, their relevance to the ultimate 
criterion. From the results, the ratings in this 
study satisfy the requirements of convergent 
validity. Examination of the validity diag- 
onals in Table 8 reveal generally good agree- 
ment among raters on traits, although it is 
clear there are differences among traits on this 
criterion. 

In terms of discriminant validity, the re- 
sults are somewhat discouraging. The variance 
component and index representing discrimi- 
nant validity (Manager x Trait) in Table 9 
indicate very little differentiation among 
traits for these ratings. This low discrimina- 
tion seems to indicate that the number of 
stimuli to be rated could be reduced, That is, 
since each trait would possess little discrimi- 
nant validity, it makes no sense to rate all 
20 dimensions. One could probably do just as 
well with fewer traits, and the degree of con- 
vergent validity for stimuli could well be the 
basis for selection. Of course, 


the rather lar 
amount of “halo” E 


present in the ratings rein- 
forces the prescription of reduction in the 
number of rated dimensions, 


M. J. KAVANAGH, A. C. MAC KINNEY, AND L. WOLINS 


The second general question in this research 
was addressed to the dimensionality of man- 
agerial job performance in terms of practical 
and scientific efficiency. In the previous para- 
graph, a partial answer was given—ratings of 
20 dimensions seems unnecessary. However, 
comparison of this research (Table 9) with 
Lawler’s (1967) study (Tables 3 and 6), 
which used only three dimensions, reveals 
several things. The differences between the 
three indexes are small, although the larger 
discriminant validity for the Lawler (1967) 
data is important. Nevertheless, both matrices 
contain considerable error. 

Scientifically, it seems the issue is whether 
the investigator can do better (i.e. reduce 
the error) by using a certain number of job 
dimensions. Alternately, it may well be that 
the fairly close magnitude of the sampling 
variance present in these two studies means 
there is a characteristic error associated with 
judgmental data. Although it is risky to gen- 
eralize on the basis of two studies, the marked 
differences in raters and traits used across 
these two studies would lend some support to 
the existence of characteristic error, In fact, 
Boruch et al. (1970), in analyzing a set of 
rating data from Larkin (1969), derived à 
set of variance components with a pattern 
highly similar to the two sets of data in this 
report. Using the ratings by two subordinates 
of 111 first-line supervisors, the variance com- 
ponents were estimated as: (a) manager— 


TABLE 10 


VARIANCE COMPONENTS AND INDEXES: 
PERFORMANCE AND PERSONALITY ‘TRAITS 


Source 


Personality Performance 
traits traits 
Variance components 
Manager (M) 298 
M X Trait (T) .059 
M X Source (S) .390 
Error .387 
Indexes 
M 56 aM 
M X Trait A5 SI 
M X Source .62 .50 
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30; (b) Manager X Trait—.11; (c) Mana- 
ger X Source—.34; and (d) error—.45. Thus, 
it may well be that decreasing the error in 
judgmental data may be quite difficult. 

Further, the implicit assumption that de- 
creasing the amount of error in a design is 
advantageous may not hold for rating data. 
When the 60 60 matrix was partitioned 
into personality trait ratings and performance 
trait ratings, there was a substantial decrease 
in the error variance for personality traits 
(Table 10). However, this decrease over the 
total matrix did not lead to increased dis- 
criminant validity which was notably lacking, 
but rather, to an increase in “halo” and con- 
vergent validity. Thus, assuming discrimina- 
tion among rating dimensions to be necessary 
for good validity, there is no particular rea- 
son to use the six personality traits exclu- 
sively. The fact of increased convergent va- 
lidity seems to be offset by the increase in 
“halo” effects, but, of course, this feeling may 
vary with investigators. 

In contrast to the partitioning described in 
the last paragraph, the split of the Lawler 
(1967) matrix by raters produced different 
results (Table 7). When self-ratings are elim- 
inated, there is a sizable decrease in error 
relative to the total matrix (Table 3). The 
importance of this error reduction is that it is 
accompanied by increases in convergent and 
discriminant validity as well as a decrease in 
the “halo” present. Obviously, these are all 
highly desirable changes, and thus, the num- 
ber of dimensions being rated may be less 
important (to validity) than who the raters 
are. 

Following this logic, three matrices were 
formed from the 60 X 60 (Table 8) using 
combinations of two raters, that is, PM versus 
FM+, PM versus FM—, and FM- versus 
FM-. Computation of the variance compo- 
nents and three indexes revealed little differ- 
ences in value across the three matrices as to 
convergent and discriminant validity, source 
bias, and error. There was slightly more con- 
vergent validity between the two foremen, a 
result not unexpected considering their shared 
viewpoint from the same organizational level. 
Thus, based on these data, it does not appear 
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that source of ratings makes any great dif- 
ference relative to validity. 

Concerning dimensionality then, it appears 
in this research that the number of stimuli 
must be reduced. This decision is based on 
two considerations. The low discriminant va- 
lidity on the total ratings indicates it is sci- 
entifically inefficient to include all 20 dimen- 
sions, and practically, differentiation of job 
performance into 20 dimensions may well 
exceed the capacity of the raters. However, 
with this decision in mind, the investigator is 
still faced with the question of which traits 
to keep. As mentioned earlier, degree of con- 
vergent validity for stimuli would seem to be 
an appropriate criterion to use. That is, the 
investigator faced with a large matrix of 
trait-source correlations could select, for fu- 
ture use, the traits with the higher validity 
diagonal values; and then, using the short 
computational formulas for variance com- 
ponents, determine if he is improving or de- 
creasing his “relevance” relative to the total 
matrix. That is, the inclusion of three raters 
with varying perspectives on the job behavior 
of the ratee allows the investigator to exam- 
ine the differing degrees of agreement on the 
various job performance dimensions. Although 
this agreement may be indicative of inter- 
rater reliability, the convergence of judgments 
from differing organizational viewpoints in 
the context of job performance demands fur- 
ther interpretation. It seems reasonable that 
traits with higher diagonal values are more 
important and meaningful for determining 
what defines effective job performance in this 
organization. Thus, using the rating dimen- 
sions with higher validity diagonal values 
among the three raters should result in cap- 
turing more relevant, meaningful (visible?) 
information about managerial job perform- 
ance. In fact, it would be quite possible to 
develop an iterative computer program to 
take first two traits, then three, and so on, 
with tolerance limits to indicate when an 
increase in dimensions did not lead to signifi- 
cantly increased validity values. 

Obviously, the procedure outlined in the 
previous paragraph speaks not only to the 
issue of scientific efficiency, but also practical 
efficiency. Consider the investigator who ob- 
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VARIANCE COMPONENTS AND INDEXES 
FOR Five Tnarrs* 


Variance | 

SOHSCR component | 
Manager (M) 532 :66 
M X Trait (T) .059 AT 
M X Source (S) -413 .60 
Error .279 — 


* The five traits are Planning. Investigating, Methods, 
Human Relations, and Leadership Orientation. 


tains ratings on 20 stimuli from three raters. 
He is faced with the practical issue of reduc- 
ing the size of the rating form to appease 
complaints of length, but in such a way as to 
maintain good validity. Therefore, it would 
be to his advantage to reduce the required 
ratings while maintaining the relevance (va- 
lidity) of his immediate criteria of job per- 
formance. 

Moving to the study at hand (Table 8); 
Suppose we select the five traits with the 
highest agreement across raters (numbers 1, 2, 
13, 16, and 18). The estimated values for the 
variance components and indexes are presented 
in Table 11 for the three raters on these five 
dimensions. Comparing these results with 
those in Table 9, it is quite apparent that 
using these 5 dimensions, rather than all 20, 
not only is more efficient in a practical sense 
(i.e., managerial time spent completing them), 
but also scientifically in terms of validity. 

A note of caution here relative to general- 
izing from these results is needed. This reduc- 
tion procedure will result in tailor-m 
forms for the organization providin 
nal 


ade rating 
g the origi- 
data, and, of course, the reduced form 
should be subjected to validation at a dif- 
ferent time or on a different population sam- 
ple. It would be foolish, for example, for 
someone to take the five traits identified above 
for use in their organization without local 
validation work. 

The third general question about mana- 
gerial performance in this research involved 
the content question, Tt is obvious from Table 
10 that the previously assumed null hypothe- 


sis of no relevance for ratings of personality 
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traits for job performance is not supported in 
this study. In fact, on the basis of the Te: 
sults, it appears that ratings of personality 
traits fare somewhat better than do those on 
performance traits, contrary to some logical 
arguments (MacKinney, 1960). That is, with 
personality stimuli, there is a sizable reduc- 
tion in error and an increase in convergent 
validity over the total matrix (Table 9) not 
exhibited by the ratings on the performance 
dimensions. The drawback is, of course, that 
there is also an increase in “halo” and no 
substantial increase in discriminant validity. 
These results present a strong empirical 
argument to those opposed to ratings of per- 
sonality traits (for managerial performance) 
because there has only been “soft-headed, 
intuitive evidence. These data may provide & 
partial solution to the problem stated by 
Chew and Howell (1960), that is, that ther 
is a practical. demand for personality tra! 
ratings, and it is our job to overcome ing 
difficulties (one of which is lack of empirically 
demonstrated validity) in using them. 
seems that the approach suggested earlier; 
reducing dimensions on the basis of conver- 
gent validity, may provide personality dimen- 
sions relevant for rating job performance. The 
five selected traits in Table 11 consist of 
three performance variables and two person- 
ality ones, It is important to realize, as argued 
earlier, that the choice of certain dimensions 
over others in terms of validity diagonals 
represents more than just interrater reliabil- 
ity. This is partially demonstrated by the 
fact that using 5 traits (Table 11) com" 
pared with 20 traits (Table 9) results 1” 
more interpretable results concerning job per- 
formance due to the decreased error. 
course, the further verification of these five 
dimensions as more rele; 
job performance requires 
total rating scheme on thi 
cross-validation on 
gardless, the 
dimensions in 


ant for manager! 
replication of the 
S same sample E. 
a different sample. ei 
appearance of the personali? 
the. reduction analysis seem? 
to argue that one should use the dimension” 
that best fepresent job perfo n 


t rmance in 
situation, whatever they 


may be. 


, 
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EFFECTS OF FILM MATERIAL UPON 
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Studies of symbolic modeling influence 


Experiments concerned with film 


communications are discussed. 


This paper reviews recent studies concern- 
ing the influence of film material (television 
and movies) upon human behavior. Consider- 
able data now exist relevant to these effects, 
much of which is recent and worthy of review. 
The present effort is limited to those investi- 
gations directly addressed to the role of films 
in affecting socially sanctioned behaviors such 
as aggression and altruism. Those studies per- 
taining to the content of film and television 
entertainment or those addressed to the demo- 
graphic or Personality characteristics associ- 
ated with viewing habits are not included, 
The focus is upon experimental studies 
conducted within laboratory settings, 


AGGRESSION 


cance, but becau 


1962), 


Discussion of the impact of aggressive film 
content upon behavior has been 


dominated by 
the now-old controversy concerning the notion 


1969a). Walters, 
Acker (1962) 
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s upon human behavior are reviewed. 
influences 
altruistic responses, as well as their 


upon aggression, phobic and 
impact upon social judgments and 


tendants who had seen an aggressive. fight 
scene would give stronger electric shocks for 
longer durations to the experimenter's accom- 
plice than attendants who had viewed a film 
of innocuous teenage activities, Berkowitz 
and Geen (1966) exposed 88 male college 
undergraduates to one of two films. One go 
viewed a film of a prize fight scene from Pu 
movie The Champion, the other an exciting 
movie of a track meet. Using the same mea 
sure of effects as Walters et al, (adminis 
electric shocks), subjects, previously int 
by the experimenter, who had viewed pe 
fight scene and whose target was similar y 
name to that of the film’s victim, gave me 
electric shocks than control subjects. T 
thus suggest those particular characteristics 
of person and film which may trigger aggres 
sive acts by the viewer. These conditions ar€ 
reviewed subsequently. s 
The impact of television models upon E 
havior has not been limited to the eot 
of adults. Bandura, Ross, and Ross (196387 
compared the effectiveness of real-life ager 
sive models, filmed models, or a filmed mod E 
dressed as a cat (presumably analogous to 1 
Cartoon figure) upon the aggressive baiara 
of nursery school children. A control ape 
Witnessed no film, but was frustrated in ae 
Same manner as experimental subjects. O 
servations of the children’s imitation of highly 
novel forms of aggressive behavior as well 45 
honimitative aggression were made by tW? 
judges while the child was in a setting other 
than that in which he viewed the films. TE 
authors reported that any exposure to 4 
aggressive model. be it 4 live, filmed, ^ 
cartoon figure—increased the viewer’s tot4 
aggressive behavior toward inanimate object? 
All models had a significant effect in evoking 
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imitation of novel forms of aggression, the 
most influential being the live model. The 
film model was not more effective than the 
cartoon figure in altering imitative aggression. 
The three types of models did not differ in 
their impact upon total aggression or more 
Specific types of specific imitative aggressive 


performance. 

Since the measures of aggression were 
striking a toy doll, shooting < toy pistol, 
verbal imitation of aggressive themes, and 
shooting darts, Aronfreed (1968) has sug- 
gested that such behavior reflected play, not 
anger. While this may be true, Bandura et al. 
(1963a) speculated that most aggressive be- 
havior may be learned for "prosocial" pur- 
poses, and they suggested that high arousal 
produced, for example, by frustration, may 
elicit overlearned responses, and thus may 
be employed in the service of aggressive 
goals. If they are right, Aronfreed's criticism 
becomes irrelevant. 

While Bandura et al. (1963a) have demon- 
strated the generalization of imitative aggres- 
sion across situations, Mussen and Rutherford 
(1961) found that the objects of aggression 
need not be those originally presented as the 
filmed model's target. They found that verbal 
statements by first-grade boys and girls sug- 
gesting their desire to destroy a balloon were 
increased after exposure to a cartoon film 
showing aggressive behavior by animated ani- 
mals and flowers. However, one can reason- 
ably challenge the degree to which verbaliza- 
tions of “pop” reflect behavioral dispositions 
towards aggressive actions. Unfortunately, 
their experimental design did not allow for a 
motoric expression of such a disposition. 

Two experiments have been conducted con- 
cerning the role of film models in affecting 
children's aggressive behavior directed toward 
other persons. While evidence exists concern- 
ing the role of symbolic models in affecting 
assaults upon a variety of inanimate objects 
and verbal expressions of such intents, few 
experiments have employed another human 
as the object of children's aggression. Siegel 
(1956) exposed children between the ages of 
3.9 and 5.1 years to two cartoons, one empha- 
sizing aggression (Woody Woodpecker), the 
other lacking such a theme (Little Red Hen). 


The order of film presentation was counter- 
balanced with a week's separation between 
the occasion of exposure and testing. Each 
child was rated as to his aggressive behavior 
toward a peer by a judge who was unfamiliar 
with the treatment conditions after each 
film showing. No film effects were found. In 
a later experiment, Hanratty, Liebert, Morris, 
and Fernandez? found that children are more 
likely to emit assaultive behavior toward a 
person dressed as a Bobo doll aíter having 
viewed a filmed model aggress against an 
actual Bobo. Thus, it appears that aggressive 
responses of children that are provoked by 
a filmed model can be directed toward living 
as well as inanimate objects. 

There is reason to believe that preferences 
concerning the viewing of aggressive activity 
can be altered through films. Lovaas (1961) 
found that 4- to 6-year-olds, from low-income 
families, preferred witnessing one doll striking 
another doll to a nonaggressive scene after 
viewing a film demonstrating aggressive acts. 

While it does appear likely that aggressive 
themes can elicit aggression or dissipate in- | 
hibitions concerning such acts, it is important 
to determine those conditions within the film 
which heighten its effects. Berkowitz and 
Rawlings (1963) exposed male and female 
undergraduate students to either an anger- 
provoking or a neutral experience and then to 
a film of a prize fight in which the aggression 
was either “justified” or not on the basis of 
instructions concerning the victim’s character. 
Anger toward the experimenter, as indexed by 
one of two questionnaire items, was increased 
if the subject was made angry toward the 
experimenter prior to the film and had wit- 
nessed a “justified” aggression, As the authors 
pointed out, the “just desserts” rationale for 
aggression may be a powerful disinhibitor of 
such behavior by the viewer. 

Consequences of the aggressive action of 
the filmed model will also affect the children’s 
imitation of that model. Bandura, Ross, and 
Ross (1936b) exposed nursery school children 


3 Hanratty, M. A. Liebert, R. M., Morris, L. W., 
& Fernandez, L. E. Imitation of film-mediated ag- 
gression against live and inanimate victims. Paper 
presented at the meeting of The American Psycho- 
logical Association, Washigton, D. C. September 
1969. 
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to one of the several types of films depicting 
a controversy among two children concerning 
one child’s toys. One film showed a boy 
hitting and kicking the owner, and, as a re- 
sult, subsequently Possessing the toy and 
receiving other rewards (cookies, toys, and a 
coke). The film terminated with the an- 
nouncer indicating that the boy was the 
"victor." A second group viewed a film 
demonstrating a repulsed attack with the ag- 
gressing child losing his bid for possession of 
the rewards. Another group of children wit- 
nessed a third film showing the two television 
models playing vigorously but not aggres- 
sively, while a fourth group of subjects did 
not view any film presentations. Imitation of 
aggression, as measured by observation of the 


groups. 
the experimental design was 


confounding of obtained 
resources with differential-affect arousal and 
perhaps experimenter approval, The rewarded 
aggression scene showed the victor as quite 
happy, while the Successful defender was por- 
trayed as being affectively neutral. While this 
arrangement js perhaps more analogous to 
real life than that which would show an af- 


fectively neutral aggressor obtaining the new 
Jesources, it does 


by em- 
le pathic responses. Given the importance of 
empathy in guiding children's behavior 


(Aronfreed, 1968; Midlarsky & Bryan, 1967), 
such a confound would 
It is Perhaps because 


not the 
unsuccessful one, Interestingly, children indi- 
interview that 
model and dis- 


ose conditions 
where rewards were given for the aggression, 


Apparently, some children reasoned that the 
victim deserved such treatment because of his 


selfishness and his lack of ability to control 
the aggressor. According to the viewer, the 
defeated victim was culpable for having lost. 
On the other hand, children morally con- 
demned the unsuccessful thief. Whether such 
judgments of the models were the result of 
dissonance reduction, as the authors sug- 
gested, or conformity to the judgments ex- 
pressed by the film announcer, or due to 
differential affect arousal, cannot be deter- 
mined. 

A relationship between success and emula- 
tion was also reported by Albert (1057). He 
exposed 8- to 10-year-old children to one of 
three conditions. In one film, Hopalong 
Cassidy behaved aggressively and successfully 
against the villain. A second film depicted 
the same sequence with the exception that 
Cassidy lost the fight by being shot by the 
villain, while the third film demonstrated the 
fight scene Without a resolution. Using the 
Rosenzweig Picture-Frustration Test to mea- 
Sure the child's extrapunitiveness, aggression 
Scores increased from the pretest measure 
only under those conditions in which Cassidy 
Was successful. The outcome of aggression 
Was again demonstrated to alter aggressive 
verbal behavior, ; 

The importance of obtained resources in 
affecting children's judgments of the attrac: 
tion of the model is demonstrated in a study 
by Zajonc (1954), using a different medium, 
the comic book. In that experiment, 10- to 
14-year-old children esteemed the hero on the 
basis of the hero's success in meeting a crisis 
rather than his assumption of either an affilia- 
tive and cooperative interaction or a power- 
oriented position vis-à-vis his subordinates. 
One might reasonably challenge Zajonc’s z 
sumption concerning the degree to whic 
Power orientations of an adult in a crisis sig- 
nificantly depart from commonly accep an 
social norms held by children. No independen 
measure of children’s reactions to these 
orientations were obtained, ; 

In a recent experiment (Bryan 1), 7-yeat^ 
old children Viewed one of three filmed scenes: 
In one, a 9-year-old gir] forcibly confiscate“ 


ised socia] transgressors. Un- 
Northwestern Universit: 
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candy from a 7-year-old girl; in another, the 
oldest model unsuccessfully attempted to take 
the younger child's goods, while a third film 
depicted a cooperative interaction with no 
transgressions. Viewers were asked to make 
judgments concerning the oldest model and 
her desirability on such dimensions as nice- 
ness, naughtiness, likableness, as a playmate, 
Sibling, and object of emulation. Children 
rejected both the unsuccessful and successful 
transgressor, but were significantly more 
critical of the latter. The results thus contrast 
with those reported by Bandura et al. 
(1963b) and perhaps those reported by 
Zajonc (1954). These conflicting results may 
be attributable to a number of methodological 
differences, including the gender of the trans- 
gressing models, age of viewers, the oppor- 
tunity to imitate socially inappropriate behav- 
iors, experimenter reaction to the transgres- 
sion, the magnitude of the “sin” involved, 
and victim responses. 

Another dimension of film which may bear 
an important role in affecting aggressive be- 
havior is the similarity of the filmed scene 
or characters with those of the viewer 
(Feshbach, 1961). Berkowitz and Geen 
(1966) showed that frustrated subjects would 
administer stronger and longer electric shocks 
to another person than control subjects after 
observing a film in which the victim of the 
aggression bore the same name as the sub- 
ject’s frustrator. Hicks (1965) exposed 3- to 
6-year-old children to a film depicting an 
adult male or female, or a peer male or 
female, assaulting an inflated plastic doll. 
In the same manner as the experimental 
groups, a group of children were frustrated 
Prior to test trials, but saw no film. Two 
judges observed the viewer's imitative aggres- 
Sive responses to the plastic doll. Both the 
main effects of film and viewer gender were 
significant. However, no significant interaction 
of sex of subject and sex of model was found. 
The imitative aggression of subjects witness- 
ing the peer male aggressor was significantly 
greater than of those who viewed the adult 
male or the peer female model conditions. 
When compared to the control subjects, all 
varieties of models significantly increased 
imitative performance. Not surprisingly, a 


6-month follow-up found no lasting effects 
of the film models upon these responses. 
Recently, studies have been conducted to 
establish those viewer traits which interact 
with aggressive film themes to facilitate subse- 
quent aggressive action by the observer. Of 
the viewer characteristics related to aggressive 
behavior, the most important appears to be 
frustration and its correlate, aggression. 
Feshbach (1961) has added new complexity 
to the catharsis concept by suggesting that 
the subject’s experience of vicarious aggres- 
sion will reduce subsequent aggressive action 
only if he is angered at the time of the film 
presentation. He found that previously 
angered college students exposed to a fight 
film were less likely to emit aggressive words 
On à word-association test administered by 
the insulting experimenter than subjects not 
so angered, or than those exposed to the 
aggressive film scenes without being frus- 
trated. Berkowitz and Rawlings (1963) have 
suggested that subjects in the insult group 
exposed to the aggressive film may not have 
felt that the filmed aggression was justified 
and thus became more inhibited about such 
behavior than the noninsulted group. The 
description of the scene did not include suf- 
ficient details to evaluate this possibility. 
Virtually all other investigations of the inter- 
action of anger with film-instigated aggres- 
sion suggest that frustrated subjects may be 
particularly likely to show an increase in such 
responses following a film depicting violence. 
While frustration may not be a necessary 
condition for eliciting aggression (see Walters 
et al, 1962), it certainly does serve as an 
instigator to it. As reported earlier, Berko- 
witz and Geen (1966) found that angered 
males gave longer and stronger electric shocks 
to another if they had been negatively evalu- 
ated (as indexed by seven electric shocks) 
rather than positively evaluated (as indexed 
by a single shock) by him, at least under 
those conditions where the eventual target 
of aggression had the same name as the 
victim in the film. Moreover, Berkowitz and 
Rawlings (1963) found that subjects, insulted 
by the experimenter prior to exposure to the 
aggressive film, indicated greater personal 
rejection of that experimenter than subjects 
insulted but not exposed to a violent film 
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scene, or those exposed to the scene without 
being previously demeaned. m 
Maccoby, Levin, and Selya (1955) have 
demonstrated that frustration affects the sub- 
jects’ recall of aggressive behaviors. Fifth- 
and sixth-grade children, matched on sex and 
intelligence, were frustrated or not frustrated 
by means of a rigged spelling bee, and then 
Were exposed to filmed aggression. A week 
later the children who had been frustrated 
were found to recall better than nonfrustrated 
children the film’s central aggressive themes, 
such as the identity of the aggressor, Inter- 
estingly, however, the nonfrustrated chil- 
dren had better recall than their frustrated 
counterparts, for both the incidental details 
of the aggression (such as the position of 
incidental characters) and nonaggressive con- 
tent. In a follow-up investigation employing 


children from a semirural region, these results 
were not replicated, 


While some studies have failed to find 
sex differences in film-mediated aggression 
(Berkowitz & Rawlings, 1963; Mussen & 
Rutherford, 1961), several experiments have 
reported that boys are more likely than girls 
to both demonstrate film-produced aggression 
(Bandura et al, 1963a; Hicks, 1965) and 
to better recall aggressive content (Maccoby 
& Wilson, 1957). Since children of both sexes 
were exposed to both aggressive boy and girl 
models in studies by Bandura et a]. (19632) 
and Hicks (1965), and sex differences in 
imitative aggression were found, but not a 
model by sex of subject interaction, it seems 
reasonable to assume that gender role rather 
than identificatory objects (model-subject 
similarity) within the film inhibited the 
female’s aggressive behavior. 

While such other personal characteristics 
of respondents as intelligence (Himmelweit, 
Oppenheim, & Vince, 1958; Schramm, Lyle, 
& Parker, 1961), social class (Maccoby, 
1954), age (Hale, Miller, & Stevenson, 1968), 
and personal adjustment (Bailyn, 1959; 
Maccoby, 1954) have been correlated with 
various aspects of television viewing, it is yet 
to be determined that any are important 
influences upon imitative aggression of film 
models, Thus, while gender roles and frustra- 
tion appear to play an important role in 
imitation of aggressive models, such typically 


powerful variables as intelligence and social 
class generally have been ignored in labora- 
tory studies. 


Prosociat BEHAVIOR 


Considerable speculation and some data 
have been offered recently concerning the 
potential of films for producing socially 
desirable behaviors, Encouragement of its use 
in psychotherapeutic efforts with chronic hos- 
pitalized psychiatric patients has been offered 
(Stoller, 1967), an experiment with its utility 
for the behavior control of juvenile delin- 
quents is now under way (Sarason & Ganzer, 
1969) and at least one investigation of its 
potential therapeutic usefulness for school 
Settings has been conducted (Bandura & 
Menlove, 1968). While cries concerning the 
affrontery of such practices to the “doctor- 
patient relationship" are sure to be expressed, 
it seems clear that the use of television as à 
behavior controller will become increasingly 
greater. Moreover, the results of recent studies 
on the impact of "prosocial" models upon 
children's own sacrificing behavior and judg- 
ments have implications for both the develop- 
ment of public commitment and the relation- 
Ship of moral attitude to deeds (Bryan & 
Walbek, 1970), 

The development of courageous behavior 
by children Observing such actions has 
been recently demonstrated with both live 
(Bandura, Grusec, & Menlove, 1967) and 
filmed models (Bandura & Menlove, 1968). 
In the latter experiment, the investigators 
exposed 3. to 3-year-olds, all markedly fearful 
of dogs, to one of three films on 4 alter- 
In the Single-model condition, 
erved a film in which a model 
gradually approached anq befriended a dog. 


In the multiple-model film, observers wit- 
nessed a variety of children 
gradually, a variet 


h responses to several dogs. 
Children who wit- 
or multiple-model 
Stronger approach responses 


nessed either 
films exhibited 
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to several dogs than control subjects. The 
follow-up measurement a month later showed 
those children exposed to the multiple-model 
condition to be significantly less fearful of 
the test dogs than the remaining children. 
The most extensive study on the thera- 
peutic uses of film models has been conducted 
by Bandura, Blanchard, and Ritter (1969). 
In this most important experiment, the impact 
of film models, live models in conjunction 
with “guided participation,” and systematic 
desensitization techniques upon behaviors and 
attitudes of adults suffering from snake 
phobias were studied. Subjects exposed to the 
film models self-administered a graduated 35- 
minute film showing a variety of models 
approaching a snake. Subjects in the “guided 
participation” condition witnessed the experi- 
menter handle the feared snake and received 
instructions during their own approaches to 
the snake. Subjects treated by systematic 
desensitization were given relaxation treat- 
ments and presented instructions regarding 
the imagining of the feared stimuli. A control 
group underwent the many assessment pro- 
cedures but did not receive treatment. AII 
three types of treatments were effective in in- 
creasing approach behavior, with the guided 
participation group showing the greatest gains. 
Moreover, treated subjects showed a greater 
approach response to a snake differing 
markedly in coloration from that employed in 
the training procedure than did the control 
respondents. The subjects’ reports of their 
fears when approaching the snake were also 
affected by the treatments with the employ- 
ment of models being most effective in reduc- 
ing this affect. The data suggest, however, 
that the guided participation group showed 
Breater fear reduction than did the group ex- 
posed only to the film models. Finally, at- 
titudes toward reptiles, as assessed by attitude 
scales and semantic differential ratings, were 
more favorable as a function of the type of 
treatment used. The guided participation group 
Showed the greatest shift in this direction, 
while both model and participation groups 
were superior to the systematic desensitiza- 
tion group in attitude changes. All treatments 
were superior to the control condition in pro- 
ducing more positive attitudes toward snakes. 
A follow-up study 1 month subsequent to the 
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termination of treatment demonstrated that 
treated subjects were more courageous in 
approaching the feared object, and reported 
even less fear than on the tests immedi- 
ately following treatment. However, attitudes 
toward reptiles became more negative during 
the interim, this being particularly so for 
those subjects of the systematic desensitiza- 
tion and control groups (who had, in the 
meantime, been treated by the film-model- 
only procedure). These dramatic and impor- 
tant results cannot simply be attributed to 
patient expectancy effects insofar as most 
subjects reported that they did not anticipate 
obtaining any benefits from the therapeutic 
programs outlined, 

The therapeutic impact of film models may, 
however, be quite complex when dealing with 
the modification of aggressive behavior as a 
study by Walters and Willows (1968) demon- 
strated: 7- to 11-year-olds from a residential 
treatment center were observed and com- 
pared with nondisturbed children for a 6- 
minute testing period following the viewing 
of aggressive or nonaggressive film models. 
Control subjects were exposed to a shorter 
film containing no model (neutral film). 
Groups were matched for intelligence and age. 
Results demonstrated that nondisturbed boys 
increased nonaggressive imitative behavior 
relative to those exposed to a neutral film 
whether exposed to an aggressive or non- 
aggressive model. The results replicated those 
of previous studies in demonstrating that 
exposure to aggressive film models increased, 
relative to the nonaggression films, nondis- 
turbed children's imitation of such aggression 
(measured by both verbal and motor expres- 
sions). While emotionally disturbed boys 
were no more likely to emit aggressive re- 
sponses than nondisturbed boys, they were 
less likely to imitate nonaggressive motor 
behavior, but more likely to verbalize accept- 
able speech. 

Studies of filmed models have typically 
presented both motor and verbal representa- 
tions of the independent variables simultane- 
ously and without making them orthogonal, 
and have thus confounded their effects. The 
responses effected by various types of verbal 
and motor representations are relatively un- 
known, although both theory and data sug- 
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gest the relative autonomy of much of the 
motor from much of the verbal system 
(Campbell, 1963; DeFleur & Westie, 1963; 
Mills, 1963). In a series of experiments, 
Bryan and Walbek (1970). and Walbek 
(1969) investigated the relative impact of 
moral exhortations and behavioral examples 
of filmed television models upon children's 
judgments, social transmissions, and behavior. 
Of interest was the impact of a model whose 
behavior was discrepant from his preachings; 
that is, one who showed hypocrisy. The design 
of the experiments was such that the child 
was exposed to one of two behavioral ex- 
amples—a model who either donated money 
(or gift certificates) to the March of Dimes 
(altruistic model) or one who retained all of 
his winnings for himself. One-third of the chil- 
dren within each of the two conditions heard 
the model exhort charitable actions (e.g., “It’s 
good to give"), another third hear exhorta- 
tons for greedy actions (e.g, “It’s not so 
good to give"), the remaining third heard a 
normatively neutral conversation (e.g., “This 
game is fun"), Thus, children, typically in the 
second to fourth grades, were exposed to an 
altruistic or greedy peer or adult model, and 
to either one who exhorted charity, greed, 
or held a “neutral” conversation. The hy- 
pocrisy condition was that situation where 
the child heard the model exhort charity 


but practice greed. Model inconsistency 
was also produced under conditions where 
the model preached greed 


but practiced 
the child was 
me and claim 


Following th 


dren were interviewed regarding their judg- 
ments concerning the character of the model. 
This served as the measure of model attrac- 
tiveness. Finally, in one experiment, the chil- 
dren were instructed that while playing the 
game they were to leave a “message” for 
another child. These Messages were tape- 
recorded and employed to assess the cogni- 


tions of the children while in the experimental 
setting. The results 


Were rather consistent 
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across this series of experiments. Children 
were more likely to make self-sacrificing re- 
sponses if they had witnessed a model do so, 
although the effects were weak. Exhorta- 
tions concerning charity have consistently 
failed to alter the altruistic behavior of the 
child. The filmed models actions have been 
Íound to be a greater source of influence than 
his words upon the observers behavior. Ex- 
hortations were, however, a significant source 
determining the child's judgments of the 
model's attraction. While the model's verbal 
allegiance to the “norm of giving” (Leeds, 
1963) did not enhance the child’s conform- 
ity to it, such exhortations did play a signifi- 
cant role, as did the model’s acts, in deter- 
mining the model’s attractiveness. Hypocrisy 
or inconsistency has not yet been found to 
produce an effect on either the child’s bener 
ior or his judgment concerning the -—— 
Thus, the hypocrite gains, while the preache 
of greed loses attractiveness by virtue of Ht 
verbalizations, The preacher of charity anc 
practitioner of greed is not disparaged, i 
Character is vindicated simply by his verba 
allegiance to the norm of giving. The 
"messages" left by the children were deter- 
mined by the model's verbalizations. Chine 
exposed to either the model who exhortec 
charity or greed were more likely to preac* 
charity than those who heard "neutral" con- 
versation. Thus, in effect, the model's actions 
determined the Observer's behavior, the 
model's mora] exhortations, the viewers’ e 
Sages to another, while both the model's words 
and deeds affected his judged ecd WM 
While the independent effects of words and 
deeds of filmed models upon the child is b 
evidence from these experiments also sugges 
that the relationship between the child’s us 
cognitions concerning charity and his suc 
Corant behavior is not high. Thus, while the 
Child may well be stimulated into socially 
sanctioned verbalizations by hearing the 
model's moral exhortations, his behavior i5 
under the contro] of Observed actions. a 
Two recent experiments by Bryan® ™ 
which the model actively engaged in anti 


5 Bryan, J. H. The impact of verbal and be- 
havioral devi 


i V 
eviance upon children’s judgments of T' 


) 


THE EFFECTS OF FILM MATERIAL 57 


social behavior (theft) failed to demonstrate 
that the model's verbalizations of the virtu- 
ousness of honesty affected kindergarten 
through second-grade children's judgments of 
his attractiveness. In one experiment, children 
witnessed an adult model either restraining 
from or yielding to temptation to steal M&M 
candies. As in the series of altruism studies, 
one-third of each group's members either 
heard the model preach restraint, transgres- 
sion, or maintain a neutral conversation. Fol- 
lowing the television exposure, children were 
left alone to play a game. Each child was then 
interviewed about his judgments of the 
model’s “niceness.” Children, whether exposed 
to an indulgent or restraining model, did not 
transgress. The study demonstrated, however, 
that model’s actions affected judgments, while 
the model’s preachings did not. When, how- 
ever, the data were regrouped according to 
the child’s perceptions of the experimental 
treatments, both the model’s actions and 
his preachings were found to be significant 
variables. 

Essentially the same results were obtained 
regarding children’s reactions to an aggres- 
sive peer who was shown stealing candy from 
a younger child. The film model’s verbaliza- 
tions concerning the “badness” of stealing 
failed to affect the subject’s judgments of 
that model’s attractiveness, while the main 
effect of the model’s actions was highly sig- 
nificant. Tn neither of these experiments were 
interaction effects of words and deeds found. 
As in the altruism series, hypocrisy effects 
were absent. 


Discussion 


It seems quite clear that models as pre- 
sented in films are capable of evoking a wide 
range of response, from motor actions to 
verbalized preferences, from aggression to 
courage and self-sacrifice. 

Of the studies reviewed, most have been 
addressed to the problems of film influence 
upon aggressive actions. It now seems indis- 
putable that aggression can be elicited by this 
means. The catharsis principle has received 
scant support in the experimental literature, 
while a number of studies have suggested the 
facilitating effect of models upon aggressive 
behavior. Models appear to both demonstrate 


techniques of aggression and to liberate the 
already present skills into action for both 
child and adult. Indeed, if the findings of 
Bandura and Menlove (1968) concerning the 
extinction of children's fears of dogs can be 
generalized, then it is not surprising that 
anxiety and inhibition concerning the ex- 
pression of aggression also will show extinc- 
tion. 

Noteworthy, however, is the relative pau- 
City of experiments designed to assess film 
effects upon the viewer's assaultive behavior 
on other persons. While a number of studies 
have demonstrated that film aggression in- 
creases children's assaultive behavior toward 
inanimate objects (Bandura et al, 1963a, 
1963b; Hicks, 1965), and verbal behaviors 
presumably reflecting anger toward others 
(Albert, 1957; Berkowitz & Rawling, 1963), 
and inanimate objects (Mussen & Rutherford, 
1961), few have been conducted concerning 
such film effects upon eliciting aggression 
toward others. Berkowitz and Geen (1966) 
and Walters et al. (1962) demonstrated in- 
creased physical aggression by adults toward 
contemporaries while only Hanratty, Liebert, 
Morris, and Fernandez (see Footnote 3) have 
found such effects upon children. 

The most promising theory to guide future 
experimental efforts on film effects is that of 
social learning as advanced in the work of 
Bandura (1969b). Efforts generated by this 
theory have demonstrated some of those 
conditions affecting both the learning and 
performances of both adults and children 
through observations of models. While most 
studies on modeling have employed live 
rather than filmed models, there is reason to 
assume that the functional relationships found 
with one type of model will hold for the other. 
Moreover, the impact of both live and film 
models upon behavior in settings other than 
the experimental one has been demonstrated 
(Bandura et al., 1963a; Bryan & Test, 1967; 
Lefkowitz, Blake, & Mouton, 1955) thus pro- 
viding support for the assumption that lab- 
oratory findings pertaining to modeling phe- 
nomena will be generalizable to a variety of 
naturalistic settings. 

In light of the recent work by Bandura 
and his colleagues, it is likely that films may 
be, in the near future, more systematically 
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employed to bring comfort and/or contry 
to a vast number of “emotionally disturbed 
persons, otherwise neglected, for better or 
worse, by the professional mental helper. 
Given the effectiveness of film models in 
teaching new and evoking old responses, 
greater effort directed at their practical uses 
is warranted. A 
Finally, given the common assumption of 
the importance of social norms in governing 
moral action, special note might be taken of 
the series of studies addressed to this issue. 
While no systematic data appear available on 
the important relationship between words and 
deeds as portrayed in contemporary film 
products, aggressive behavior in the service 
of morally commendable ends appears con- 
doned. Apparently, the assumption is made 
that moral goals temper immoral actions. 
Evidence does exist, however, which shows 
the relevance of depicted outcomes associated 
with the aggressive behavior upon both the 
imitation of such behavior (Albert, 1957; 
Bandura et al., 1963a) and judgments per- 
taining to the aggressor (Bandura et al., 
1963b; Zajonc, 1954). Thus, both the imita- 
tion and interpersonal attraction of the trans- 
gressing model may be determined more by 
outcome than by moral principles, Finally, 
the series of experiments reported by Bryan 
and Walbek ( 1970) have shown the relative 
‘autonomy of words and deeds in affecting 
various behaviors, both verbal and motor, 
and the little relationship between children's 
statements of social norms and his behavioral 
allegiance to them. Their investigations sug- 
gest the possibility that the aggressive hero 
who verbalizes socially sanctioned norms may 
well be teaching the observer how to be 
brutal and what to verbali 
by Mills (1963), it seems necessary to con- 
ceptually and experimentally differentiate be- 
tween the effects of moral ex 
moral beh 


ze. As suggested 


hortations and 
avior to understand film influences, 
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EMERGENCE OF A TONIC-PHASIC MODEL FOR 
SLEEP AND DREAMING: 


BEHAVIORAL AND PHYSIOLOGICAL OBSERVATIONS 


GEORGE S. GROSSER ! xp ANDREW W. SIEGAL 


American International College, Springfield, Massachusetts 


The early evidence for, and later evidence against 


, theories which consider 
waking, sleep, and dreaming as discrete states is pr 


esented. The newer tonic- 
from research on the be- 

(CNS), autonomic nervous system 
(ANS), musculature, and mentation as observed during the various stages of 


nic-phasic events is summarized. The 
advantages of the tonic-phasic model ov 


Stages. The conclusion is drawn that, 
generating research and methodological innovations, 
have made it obsolete. The tonic-pha 


the present state of know 
further research, 


Recent developments in the field of sleep 
and dream research have been stimulated by 
the appearance of the new tonic-phasic model 
as proposed by Moruzzi (1963). In order for 
the investigator to gain an appreciation of the 
implications of this model for sleep and dream 


research, he must be aware of the important 


historical developments in this field that laid 
the groundwork for the emergence of the 
tonic-phasic model. Ernest Hartmann (1967) 
has completed a comprehensive survey of the 
literature placing emphasis on the physio- 
logical aspects of the sleep-dream cycle. Using 
a complementary approach, David Foulkes in 
The Psychology of Sleep (1966) has re. 
ported a number of studies which focus upon 
the mentation and behavioral aspects of sleep, 
This review refers only to those sleep and 
dream studies which bear directly upon the 
validity of the tonic-phasic model, 


THE EVOLUTION or THE Tunrr-SrATE 
MonEL 


In 1953 Aserinsky and Kleitman observed 
two types of ocular motility which occurred 
during sleep and noted that the two patterns 
were each associated with different electro- 


1 Requests for reprints should be sent to George S. 
Grosser, Department of Psychology, American In- 
ternational College, Springfield, Massachusetts 01109. 


er a three-state model are that only 
the heterogeneity of Stage-REM 


While the older model was valuable in 


these very developments 
isic model is not only better suited to 


"ledge, but also promises to be a valuable guide for 


encephalogram (EEG) tracings. Slow €Y* 
movements appeared during sleep Stages Y 
IIT, IV, and sleep-onset, descending Stage + 
The EEG of these stages (with the exception 
of descending Stage I) is characterized by 
slow-wave, high-voltage patterns, The rap! 
eye movements (REMs), which are compar’ 
able to those observed in a waking subjec 
who is actively attending an object in D 
visual field, occurred periodically througho" 
the night, first appearing after the passage d 
at least 100 minutes of slow-wave sleep. be. 
REMs occurred, however, only in pu 
with fast-activity, low-voltage patterns in ve 
EEG. This pattern has been termed uo 
doxical sleep? (Jouvet, 1961), “ascending 
stage 1” (Foulkes, 1966), the petani 
(Hartmann, 1965), and most recently, Mes e 
REM” (Rechtschaffen & Kales, 1968). T: 
presence of REMs along with the activate? 
EEG, which bore greater resemblance to ec 
ing than to slow-wave sleep led the inves ag 
gators to hypothesize that it was in sat 
REM that dreaming occurred. Subjects E 
were awakened during Stage-REM prose 
significantly greater dream recall (74%) d m 
they did when they were awakened p 
slow-wave sleep ( 7% dream recall), wh!© 


came to be called nonREM (NREM) i 
(Aserinsky & Kleitman, 1953), Since RE 
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did not occur during slow-wave sleep, several 
investigators confirmed the results of the 
original study, relating Stage-REM with high 
dream-recall rates, NREM sleep with very 
low dream-recall rates, and also showing rela- 
tionships between manifest dream content and 
eye movement patterns (Berger & Oswald, 
1962; Dement & Kleitman, 1957b; Dement 
& Wolpert, 1958; Kleitman, 1961; Roffwarg, 
Dement, Muzio, & Fisher, 1962). 

Various investigators focused their atten- 
tion on the behavior of the autonomic nervous 
System (ANS) during sleep (Snyder, 1967). 
Some of the parameters which were investi- 
Sated by these researchers included heart 
rate (Kamiya, 1962; Rosenblatt, Zwilling, & 
Hartmann, 1969; Snyder, Hobson, Morrison, 
& Goldfrank, 1964), blood pressure (Kamiya, 
1962; Snyder, 1967; Snyder, Hobson, & Gold- 
frank, 1963; Snyder et al., 1964), respiration 
(Aserinsky & Kleitman, 1953; Kamiya, 1962; 
Snyder et al., 1964), and brain oxygen con- 
sumption (Brebbia & Rechtschaffen, 1968; 
Kety, 1967). A penile erection cycle which 
appeared to be synchronous with Stage-REM 
was also observed (Fisher, Gross, & Zuch, 
1965; Karacan, Goodenough, & Shapiro, 
1966; Korner, 1968). Changes in body tem- 
perature were observed over various sleep 
Stages (Rechtschaffen, Cornwall, & Zimmer- 
man, 1965; Verdone, 1965). The degree of 
pupilary dilation was observed to change 
during Stage-REM with the pupils of the eye 
becoming myotic during Stage-REM, while, 
in NREM stages, they appear mydriatic 
(Bremer, 1935; Jouvet, 1961). Furthermore 
Scott, Wells, Delse, and Feather (1968) ob- 
served a decrease in salivation during Stage- 
REM. The only measure of ANS functioning 
Which shows an opposite trend to the above 
responses is the galvanic skin response 
(GSR), which appears to be more active 
during NREM sleep than in Stage-REM 
(Kamiya, 1962; Lester, Burch, & Dossett, 
1967). 

The trend which was inferred from these 
data was that, in general, vegetative func- 
tioning appeared somewhat depressed during 
NREM stages, and then suddenly became 
activated and much more variable during 
Stage-REM (Aserinsky & Kleitman, 1953: 
Hartmann, 1965, 1967; Kamiya, 1962; 
Spreng, Johnson, & Lubin, 1968). 


By 1967 sufficient data had been accumu- 
lated on the physiological patterns of the 
sleep-dream cycle to indicate to Hartmann 
(1967, 1968) that this cycle clearly deserved 
a place among the body's other metabolism- 
linked cycles in any comprehensive schematic 
of the human organism. Several researchers 
have investigated the behavior of the sleep- 
dream cycle during various stages of the hu- 
man life-cycle (Feinberg & Carlson, 1968; 
Foulkes, Pivik, Steadman, Spear, & Symonds, 
1967; Globus, Gardner, & Williams, 1968; 
Hartmann, 1968; Korner, 1968; Kripke & 
O'Donoghue, 1968; Roffwarg, Muzio, & De- 
ment, 1966). Additional information on the 
cycle resulted from the investigation of the 
physiological aspects of the cycle's phylo- 
genetic development (Hartmann, 1967, 1968). 
The empirical observations which were made 
and discussed by Hartmann (1967, 1968) 
found their complement in Snyder's (1966) 
theoretical discussion of the phylogenetic sig- 
nificance of the sleep-dream cycle's evolution. 

The apparent slow-wave sleep Stage-REM 
differences in the behavior of the CN S, ANS, 
and the musculature (all of which are re- 
viewed later), when viewed in the light of 
a series of Stage-REM deprivation studies 
(Dement, 1960, 1965; Jouvet, 1965) all of 
which clearly demonstrated an organismic 
need for Stage-REM sleep, reinforced the 
notion of a trichotomy of psychophysiologic 
stages. Combining these findings with the 
great differences in the rates of dream recall 
which were observed to exist between Stage- 
REM and slow-wave sleep awakenings in 
earlier studies, the tendency to regard these 
states as discrete entities became explicit 
when Snyder stated: 

The physiological characteristics of this phenomenon 
prove so distinctive that I consider it a third state 
of earthly existence, the rapid eve movement or 


REM state, which is at least as different from 
sleeping and waking as each is from the other... 


[Snyder, 1966, p. 121]. 


Hartmann's (1965, 1967, 1968) ^W? (wak- 
ing) *S" (NREM, slow-wave sleep) and “D” 
(Stage-REM) terminology is in keeping with 
the above three-state model, as are the neuro- 
physiological models proposed by Jouvet 
(1961) and Rossi, Minobe, and Candia 
(1963). Similarly, Touvet (1969) presented a 
thoroughgoing report on the biochemical 
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correlates of the sleep-dream cycle, as did 
Hartmann (1967) in his Biology of Dreaming. 


THE ORIGINS OF THE Towic-PHasiC 
MODEL 


Although some of the proponents of the 
three-state model recognize the existence of 
tonic and phasic components of Stage-REM 
(Hartmann, 1967; Jouvet, 1965), recent re- 
search indicates that the tonic-phasic model 
may account for many heretofore unexplained 
phenomena of sleep and dreams since the 
model not only emphasizes the differences be- 
tween tonic and phasic phenomena to a far 
greater degree, but assigns greater significance 
to them in the total conceptualization of 
sleep processes. 

The physiological phenomena associated 
with Stage-REM sleep were originally di- 
chotomized by Moruzzi (1963) into tonic 
components, which persist for the er 
tion of Stage-REM, a 
which usually 


atire dura- 
nd phasic components, 
occur in association with pe- 
riodic bursts of rapid eye movements within 
Stage-REM. Tonic phenomena include the 
ascending Stage I EEG, the presence of hip- 
pocampal theta-waves of 5 cycles per second 
(cps), and the tonic dropout of muscle po- 
tential in the antigravity muscles, Phasic phe- 
nomena include the occurrence of ponto-ge- 
niculate-occipital spikes (PGO), short term 
fluctuations of autonomic behavior, and the 
presence of myoclonic twitches and increased 
fine muscle activity. The tonic phenomena 
represent the background upon which phasic 
activation is superimposed (Hartmann, 1967: 
Molinari & Foulkes, 1969). The division of 
Stage-REM into its tonic and phasic compon- 
ents, which is dictated by many empirical 
findings, creates several theoretical problems 
which the three-state theory is not equipped 
to handle (in its unmodified form), These 
empirical findings fall into two basic cate- 
gories: (a) much of Stage-REM does not 
resemble that highly activated phase of sleep 
which is more accurately referred to as phasic 
Stage-REM; and (5) several phenomena 
which are clearly associated with phasic Stage- 
REM can be observed to occur in NREM 
sleep stages. These phenomena include ob- 
servations on both psychological and physio- 
logical dimensions, For example, high ampli- 
tude PGO spikes which occur phasically 


j 
i 


throughout Stage-REM in conjunction with | 
REM bursts (Pompeiano, 1967) have been. 
demonstrated to occur in the lateral geniculate | 
nucleus (LGN) during NREM stages of sleep 

(Thomas & Benoit, 1968). This PGO spiking 
(NREM LGN activation waves) becomes | 
more and more frequent as the following 


Stage-REM period approaches, This is an 


example of a phasic component of Sus 
REM, which is associated with Stage- REA 
dreaming, but which can, however, occur 1 
NREM sleep 


This tends to blur the tres 
state theorist’s distinction between REM and 
NREM stages, Similarly, the phasic occur- 
rence of k-complexes, which seem to correlate 
with phasic electromyogram (EMG) a 
pressions (Pivik, Halper, & Dement, wey 
occur most frequently in Stage IT (Foulke 
1966; Larson & Foulkes, 1969: Pivik et ad 
1969a, 1969b). Heart-rate responses, fines 
vasoconstrictions, and galvanic skin responi 
are temporarily correlated with k-compler a 
but such ANS responses were not correla & 
With isolated sleep spindles (Johnson E. 
Karpan, 1968). “Tt is clear that some phy es 
ological model is required which recog 
both the inhomogeneity of Stage-REM EM 
the similarity of some epochs of Stage-R - 
to much of NREM sleep [Molinari & Foulkes: 
1969, p. 281.” a 
ae ai which dictates the employ. 
ment of the tonic-phasic model in any RI 
ceptualization of sleep activity is p 
from five basic lines of research: (a) the 
havior of the autonomic nervous system, on 
muscular activity during sleep, (c) data ro" 
sleep mentation, (d) the functional M 
anatomy of sleep and dreaming, and (e) ! 
findings on the result of the deprivation 
phasic activation, 


Autonomic Nervous System 


£ E the 
The trends noted by the majority of 


ss I0 
UN s rise 
earlier Investigators Suggested both a r! 
the measure 


5 of central tendency and in 
variability of autonomic responses d 
Stage-REM, with the change in variab! ^ 
being the more outstanding (Foulkes, 
Hartmann, 1967: Kamiya, 1962; S si 
1966, 1967). Through the use of more of 
phisticated analytical techniques and ee 
precise measurement Procedures, more pe 
investigations (Cg, Gassel, Ghelard 


ing 


9 
t der: 
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Marchiafava, & Pompeiaro, 1964) have made 
the older data more meaningful. Gassel et al. 
(1964) reported bradycardia (slowing of 
heart rate) shortly after the onset of Stage- 
REM. They also noted, however, tachycardia 
(increase of heart rate) and a rise in blood 
pressure shortly after the onset of the first 
REM burst, This tachycardia and rise in 
blood pressure were then followed by a re- 
duction of blood pressure and bradycardia, 
thereby clarifying the variability attributed 
to Stage-REM by earlier investigators. Al- 
though the bradycardia and fall in blood pres- 
sure which was reported by Gassel et al. 
(1964) appears discordant when compared 
with the increase in heart rate and rise in 
blood pressure which were reported by others 
(Kamiya, 1962; Snyder, 1966, 1967; Snyder 
et al., 1963, 1964), the increased variability 
of these measurements within Stage-REM re- 
mains a point of agreement. In attempting to 
reconcile the apparently discrepant data of 
these researchers, one must examine their re- 
search methodologies. The use of averaged 
data, as employed in earlier investigations, 
may have been misleading since the peaks 
and troughs of heart rate tend to occur in 
clusters spaced discretely throughout Stage- 
REM (Rosenblatt et al., 1969) and could 
not be discerned precisely since moment-to- 
moment changes are frequently obscured by 
the use of averaging techniques (Gassel et al., 
1964). These findings, correlating episodes of 
increased autonomic variability in conjunc- 
tion with bursts of REM, were confirmed by 
Spreng et al. (1968) and Aserinsky (1967). 
"This led Aserinsky to divide Stage-REM into 
two sectors, which he called REM-M and 
REM-Q (Aserinskv, 1967). REM-M refers 
to segments of Stage-REM sleep which con- 
tain REM bursts (periods of ocular motility), 
accompanied by phasic autonomic arousal, 
Whereas REM-Q refers to tonic Stage-REM 
Which is characterized by its relative quies- 
cence of the oculomotor musculature, and 
more tonic, less variable levels of ANS func- 
tioning. 

Further research by Karacan et al. (1966) 
illustrates a further example of phasic vari- 
ability which is superimposed upon a tonic 
component of Stage-REM. A cycle of penile 
erection occurring synchronously with Stage- 
REM (erection generally lasting for the dura- 


tion of the Stage-REM period; Fisher et 
al, 1965) has been demonstrated. Karacan 
et al. (1966), however, showed that upon 
the general tonic level of erection, phasic 
tumescence and detumescence correspond to 
the anxiety content of the subject's dream. 
The observed phasic detumescence was as- 
sociated with high anxiety content (Karacan 
et al., 1966). 

The GSR observed during sleep has con- 
sistently been a *thorn in the side" of the 
three-state theorist, since it is the only index 
of autonomic arousal to show greater activity 
outside of Stage-REM (NREM) than during 
Stage-REM itself (Kamiya, 1962; Lester et 
al, 1967; McDonald, Shallenberger, & Car- 
penter, 1968). The occurrence of spontaneous 
"storms" of GSR activity outside of Stage- 
REM suggests the presence of intense au- 
tonomic arousal existing outside of Stage- 
REM, which is inconsistent with the three- 
state model’s conception of slow-wave sleep 
as containing only depressed autonomic func- 
tioning. 


The Musculature 


Although a wealth of studies have been 
conducted which describe the behavior of the 
musculature during sleep (see Dement, 1966; 
Hartmann, 1967), selection of a few critical 
studies illustrates the need for more inclusive 
explanations than can be offered by a simple 
three-state theory, based upon a REM- 
NREM dichotomy. One advantage of the 
tonic phasic model is that it enables us to ex- 
plain (a) both the continuities and varia- 
bilities observed within Stage-REM and (b) 
the continuities of phasic phenomena oc- 
curring outside of Stage-REM with those 
within Stage-REM (Molinari & Foulkes, 
1969). Touvet (1965, 1967) observed a drop- 
out of muscle potential in the dorsal neck 
muscles (nucchal) of sleeping cats during 
Stage-REM. Berger (1961) went on to ob- 
serve a decrease in the tonus of the extrinsic 
laryngeal musculature. These two studies rep- 
resent the establishment of tonic muscular 
phenomena which persist throughout Stage- 
REM. 

Turning to an examination of phasic mus- 
cular activity within Stage-REM, we first 
consider a pioneer study by Kamiva (1962), 
in which he correlated EEG Stage within the 
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Es fma (3) eben abet npe 


GBMs (2) 
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Fic. 1. Sequentially diphasic behavior of the 
musculature within Stage-REM sleep. Key: (1) The 
average number of gross body movements (GBMs) 


is linearly, and inversely, related to sleep stage 
(Kamiya, 1962). (2) The "chapter marker effect” 
(occurrence of gross body movements between REM 
bursts; Dement & Wolpert, 1958). (3) Concurrence 
of fine muscular activity 
(Baldridge et al., 1965). 


observed frequency of gross body movements 
(GBMs). Kamiya found the highest incidence 
of GBMs in Stage-REM, somewhat fewer 
GBMs in Stage II, and so on in Stages TIT 
and IV, respectively. Dement (Dement & 
Kleitman, 1957a; Dement & Wolpert, 1958) 
provided fine tuning on Kamiya’s observation 
of an increase in the frequency of GBMs in 
Stage-REM. Dement noted that GBMs oc. 
curred with significantly greater frequency 
immediately preceding and immediately fo]. 
lowing bursts of rapid eye movements 
(REM), but GBMs did not appear simul- 
taneously with REM bursts, This effect has 
been referred to as the chapter-marker effect, 
Since subjects report multiple Scene, frag. 
mented dreams following awakening from 
Stage-REM periods in which bursts of REMs 
have been Separated by GBMs, Dreams re- 
called by subjects awakened from Stage-REM 
periods in which REM bursts were not in- 
terrupted by GBMs produce dream recall that 
usually consists of a “one-act play.” Further 
Progress was made by Baldridge, Whitman, 
and Kramer (1965), using “highly 
semiconductor strain gauges” which permitted 
the detection of fine muscular activity (fma) 
during sleep. Their research showed that in- 
tegrated fine muscle activity coincided with 


sensitive 
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the presence of REMs. The results of these 
three studies are described in Figure 1. 
What becomes clear as one peruses Figure 
1 is that the muscular activity within Stage- 
REM appears in a sequentially diphasic pat- 
tern, with phasic GBMs occurring age, 
ately before and after bursts of REMs, an 
Íma occurring concurrently with REM bursts. 
Gassel, Marchifava, and Pompeiano 
(1964a) confirmed the previous studies using 
an electromyographic (EMG) technique, a 
which they observed both the tonic EM 
Suppression throughout Stage-REM, - upon 
which were Superimposed phasic inhibition 
and myoclonic twitches, the 
The following studies also demonstrate 4 
ability of the tonic-phasic model to purs 
modate data which illustrate the eX 
of phasic events throughout the sleep kl 
Bruxism (tooth grinding) originally belier 
to be a phasic concomitant of REM e 
(Reding, Rubright, & Rechtschaffen, 190 
was recently observed to occur in all a 
of sleep, with its greatest frequency by 
Served in Stage IT, often accompanied & 
GBMs (Reding, Zepelin, Robinson, Smith, 
Zimmerman, 1968). ved 
Similarly, Gassel et al. (1964a) de 
phasic EMG suppression during Stage-RE 
just as Pivik et al. (19693) observed V. 
phasic EMG Suppressions throughout a 
sleep Stages, with the greatest frequency E. 
curring in Stages IT and IV. Both of - 
Sets of studies clearly demonstrate the in 
currence of phasic muscular activity at 
NREM sleep, the kind of activity earlier * 
tributed to Stace-REM sleep exclusively: 


Mentation 


n 
In the early phases of research, da 
sleep mentation also contributed to the saci 
tion of what appears now to be an art ep: 
dichotomization of REM and NREM A 
This artificial dichotomy appears to be gi 
result of two basic trends in the earlier -— 
of sleep research: (a) the extreme differen", 


in 
In the rate of mentation-recall experience d- 


REM versus NREM sleep and (5) the tye 
ency to attribute differences of a qualita e 
nature to REM and NREM mentation: 
ment and Kleitman (1957b) reported EM 
dream recall obtained from Stage-R mu 
awakenings and only 7% dream recall f 
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NREM awakenings. Jouvet, in 1960 (cited in 
Hartmann, 1967, p. 12) found recall in 60% 
of his Stage-REM awakenings and only 3% 
recall in awakenings from other sleep stages. 
Foulkes (1966), however, in 1960, began an 
investigation specifically designed to investi- 
gate the mentation of NREM sleep stages. 
Although he confirmed earlier reports of high 
frequencies of dream recall in Stage-REM 
awakenings (8796) he also found 74% recall 
in NREM awakenings. Similarly Pivik et al. 
(1969b) reported 87.5% recall from REM 
awakenings, and 55% from NREM. This 
discrepancy between the results of Dement 
and Kleitman (1957b) and Jouvet (Hart- 
mann, 1967, p. 12), which indicated very low 
estimates of NREM mentation, and the re- 
sults of Foulkes (1966) and Pivik et al. 
(1969b) can be explained on the basis of 
experimental methodology (Foulkes, 1966; 
Hartmann, 1967) and Foulkes’ findings on 
the nature of *NREM mentation.” Menta- 
tion obtained from awakenings made in sleep 
stages other than Stage-REM (NREM awak- 
enings) were described as less well recalled, 
more thoughtlike, less vivid, less visual, more 
conceptual, and less bizarre and emotional 
than reports taken from Stage-REM awaken- 
ings (Foulkes, 1966; Rechtschaffen, Vogel, & 
Shaikun, 1963). The differences between the 
high and low NREM recall rates can now be 
explained. If, upon a NREM awakening, the 
experimenter asks a subject to report his 
"dream" or asks if the subject is “dreaming” 
and the subject's mental experience was of the 
less visual and bizarre nature of "typical" 
NREM mentation, the subject may simply 
say that he is not dreaming. Even if he should 
report his less “dreamlike” NREM menial 
production, the scorer, using a strict criterion 
of dreaming, might not count this recall at all 
since it is not a “dream,” and it is only 
dreams that the experimenter is interested in 
studying (Foulkes, 1966). Even the distinc- 
tion drawn by Foulkes (1966) and Recht- 
shaffen et al. (1963) between the qualitative 
differences between Stage-REM and NREM 
Stages’ mentation is too clear-cut and di- 
chotomous, and must be qualified (Foulkes, 
1966, 1969a, 1969b; Foulkes, Spear, & Sy- 
monds, 1966; Pivik & Foulkes, 1968), since 
it has been shown that NREM reports can, 
at times, be almost indistinguishable from 


Stage-REM reports (Foulkes, 1966, 1970a, 
1970b; Pivik & Foulkes, 1968). The degree 
of “dream-likeness” of NREM mentation has 
been observed to depend not only upon indi- 
vidual differences in personality (Pivik & 
Foulkes, 1968), but also upon latency of 
awakening (Pivik & Foulkes, 1968; Shapiro, 
Goodenough, Lewis, & Sleser, 1965) , and time 
of night (Foulkes, 1966; Pivik & Foulkes, 
1968). “Dreams from the first REM period 
of the night have generally been found to be 
less vivid than those reported from later 
REM periods. . . . Similar findings have been 
reported for successive periods of NREM 
sleep [Foulkes, 1970b, p. 165]." The dream 
likeness of material retrieved from sleep onset 
awakenings has been shown to be affected by 
subject variables to an even greater degree 
than post-sleep-onset NREM awakening re- 
ports. This suggests that “the possibility is 
opened up that a psychological continuum 
within Stage-REM may partially overlap with 
one in NREM sleep [Molinari & Foulkes, 
1969, p. 347]." This continuity of REM and 
NREM mentation becomes even clearer when 
we consider that some degree of content in- 
terrelatedness has been observed (Offen- 
krantz & Rechtschaffen, 1963; Rechtschaffen 
et al, 1963; Trosman, Rechtschaffen, Offen- 
krants, & Wolpert, 1960; Verdone, 1965). 
The above data on sleep mentation are clearly 
incongruent with the three-state theory in its 
unmodified form, but can be accommodated 
within the scope of the tonic-phasic model 
since they demonstrate (a) a continuation of 
a correlate of Stage-REM (dreamlike menta- 
tion) into NREM sleep stages and (5) simul- 
taneously exhibit extreme variability of men- 
tation within Stage-REM periods. 

Further variability within single Stage- 
REM periods has evolved out of the work of 
Aserinsky (1967). Aserinsky (1967) divided 
Stage-REM into subphases using the presence 
of REMs as his descriptive index. Stage-REM 
episodes in which REMs are present were 
referred to as REM-motile (REM-M), while 
Stage-REM episodes characterized by ocular 
quiescence were termed REM-Q. REM-M 
corresponds to phasic Stage-REM, REM-Q 
to its nonphasic state. Aserinsky then sug- 
gested that these two physiologically differ- 
ent states might each support different “levels 
of dreaming” (Aserinsky, 1967). This hy- 
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pothesis was independently put forth in 1967 
by Cohen, Di Bianco, Fecci, and Shapiro, 
(1968). Molinari and Foulkes (1969) experi- 
mentally tested Aserinsky’s hypothesis in a 
recent paper entitled “Tonic and Phasic 
Events during Sleep; Psychological Correlates 
and Implications," in which they observed 
markedly different types of mentation in- 
digenous to the REM-Q and REM-M sectors 
of Stage-REM. REM-M was found to con- 
tain dream reports which were characterized 
by the predominance of "primary visual im- 
agery (PVE)" whereas REM-Q reports con- 
tained (a) either PVE in addition to active 
cognitive processes which interpret and re- 
flect upon the PVE, or (5) the presence of 
purely cognitive processes in the absence of 
visual imagery. Therefore, both PVE which 
has undergone "secondary cognitive elabora- 
tion" and purely cognitive processes are 
scored "secondary cognitive elaboration” 
(SCE), (Molinari & Foulkes, 1969). 

Of the reports obtained from REM-M 
awakenings, 88% were scored PVE, while 
REM-Q contained 20% (p < .005). Reports 
from 80% of the REM-Q awakenings were 
scored SCE, REM-M contained only 12%. 
The interjudge agreement for two experi- 
menters’ scoring of SCE versus PVE menta- 
tion-reports was 95% for REM-Q reports and 
100% for REM-M reports. The fact that re- 
liable ratings could be made demonstrated 
the extreme heterogeneity of Stage-REM 
mentation. This is congruent with the division 


of Stage-REM into phasic and topic subdivi- 
sions. 


In terms of the SCE-PVE 
awakenings did not differ sign 
ascending stage IT (NR-A2), 
(SO-2), or sleep onset stage I (SO-1) awakenings, 
Content recalled from REM-M awakenings was 
qualitatively different - +. from all other awakening 
categories [Molinari & Foulkes, 1969, p. 353]. 


classification REM-Q 
ificantly from NREM 
sleep onset Stage II 


These findings clearly 
areas of emphasis of (| 
using data on its ps 
intra-Stage-REM varia 


ilustrate one of the 
he tonic-phasic model 
ychological dimension, 
bility. 

The Functional 


Neuroanatomy of Sleep and 
Dreaming 


Up to this point w 
which exemplify both 
tivities of several organ 


e have presented data 
tonic and phasic ac- 
Systems during sleep. 
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Just as the central nervous system serves s 
the integrating mechanism for wile 
physiological processes, an understanding 0 
the tonic and phasic activities during sleep 
may be facilitated if one regards central neu- 
ral “mechanisms as a conceptual scheme which 
organizes and unifies various sleep phenomena 
—peripheral and mental. a 
In 1949, Moruzzi and Magoun reporte 
that the total destruction of the midbrain 
reticular formation produced animals sho 
ing prolonged slow-wave sleep. This. s: 
came to be known as the ascending reticulg 
activating system (ARAS). It was considere 
to be the center for wakefulness. m 
More recently, great progress has ws 
made, in the laboratories of Jouvet (196 i- 
and Moruzzi (1964), among others, in spe 
fying the nuclei and tracts involved F 
arousal, NREM or slow-wave sleep, e 
Stage-REM. A high transection of the "Hs 
Stem, at the rostral midbrain, yielded fies 
slow-wave and sleep-spindle-EEG that typ! E. 
slow-wave sleep. This effect could not be an 
tributed to the ascending inhibitory syste el 
the cell bodies of which are located in d 
in the medulla. It was explained by the d 
of nonspecific medial thalamic nuclei i oa 
nández-Peón, 1965: Moruzzi, 1963). T 
thalamic nuclei activate such a large a 
of cortical neurons, and have such an wo 
sive synchronizing effect on these cells, ; hi 
the ensuing cortical EEG takes on a men 
voltage, low-frequency form. For evidence ial 
regularly timed firing of cells in the n 
thalamus and the ability of cortical cel p 
keep in step during slow-wave sleep; 
Verzeano and Negishi (1961). "m 
A negative feedback loop of inhibitory the 
pulses from the stimulated cortex back to nay 
stimulating nonspecific media] thalamus "inf 
participate in the synchrony by eleva jial 
thresholds across the board. As the ies 


: mu 
thalamic neurons recover together, they 5! gvi 
taneously activate many cortical cells. ^. g 


2 f vi 
dence of negative feedback loops inv? pe 
the cortex and low 


er brain regions is an 
found, for instance, in Dell, Bonvallet, ti al 
Hugelin, 1961. Evidence that most CO ag 
cells fire more frequently during Stage- as 
sleep than they do during wakefulness 65 


been amply provided by Evarts (1961, 1 
1967). 
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With connections between thalamus and 
brainstem intact, however, the initiation of 
sleep can occur within the medulla. Stage- 
REM sleep was attributed by Jouvet to the 
activity of a particular nucleus within the 
pontine reticular formation (RF), the nu- 
cleus reticularis pontis caudalis (NRPC). 
Coagulation of this nucleus, and some nearby 
cell bodies, eliminated Stage- REM sleep and 
permitted ‘only wakefulness and slow-wave 
sleep to occur (Jouvet, 1961). The rostral 
continuation of the NRPC is the nucleus 
reticularis pontic oralis (NRPO; see Brodal, 
1957, Figure 1, pp. 10-11). After studying the 
effects of a series of hemisections of the pons, 
Rossi et al. (1963) altered Jouvet's “Stage- 
REM center” a bit, since their results sug- 
gested that the Stage- ‘REM influences originate 
in the rostral part of the NRPC and the caudal 
part of the NRPO. If lesions were made still 
more rostrally, slow-wave sleep predominates 


and Stage-REM sleep does not occur. This 


would seem to isolate the region of the 
boundary of these two nuclei as the center 
for Stage-REM sleep. The unilateral symp- 
toms (produced by the hemisections) were 
cleared up after several days, a process re- 
ferred to by Rossi et al. as “compensation.” 
Since NRPC and NRPO neurons rarely have 
long axons with rostral terminals, Rossi et al. 
suggested that a detour around the unilateral 
lesion may have been established, involving 
the activity of short-axon neurons with col- 
lateral fibers that cross the midline. 

Magnes, Moruzzi, and Pompeiano (1961) 
have shown that the source of slow-wave 
sleep in the medulla is the nucleus of the 
tractus solitarius in the caudal medulla, Con- 
trol lesions ruled out any role in slow-wave 
sleep for the medial lemniscus and the gracile 
and cuneate nuclei, which are neighboring 
structures. The effect occurs in cats with the 
brainstem separated from the spinal cord, 
ruling out ascending influences from the cord. 

The studies described thus far were claimed 
to have isolated a center for wakefulness in 
the ARAS, a center for slow-wave sleep in 
the medulla, and a center for Stage-REM 
sleep in the pons. Thus the evidence of these 
studies tends to support the three-state model. 

Data on structure-function relationships 
responsible for the tonic and phasic com- 
Ponents of Stage-REM sleep highlight the 
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need to draw a sharp distinction between 
these two categories. The topic phenomena 
associated with Stage-REM are not distin- 
guished merely by temporal criteria. They 
are based on distinctly different anatomical 
mechanisms from those responsible for the 
phasic manifestations of Stage-REM. Jouvet 
(1967) established the specific relation of the 
locus coeruleus (LC) to the tonic phenomena 
of Stage-REM sleep. Bilateral coagulation of 
the LC, while abolishing tonic phenomena 
such as fast, low-voltage Stage-REM EEG, 
permitted the emergence of a slow-wave sleep 
EEG concomitant with what are normally 
phasic components of Stage REM such as 
rapid eye movements and muscular twitches. 
This result illustrates the neuroanatomical in- 
dependence of tonic and phasic Stage-REM 
mechanisms and strongly suggests that their 
behavior correlates should be considered in- 
dependent as well. 

In a series of studies, O. Pompeiano and 
his co-workers isolated the neural basis of 
several phasic aspects of Stage-REM. From 
studies combining the methods of ablation 
and microelectrode recording, Marchiafava 
and Pompeiano (1964) found that a phasic 
barrage of impulses originating in the somato- 
sensory cortex occurs during REM bursts. 
Bizzi, Pompeiano, and Somogyi (1964), in a 
single-unit study of the activity of vestibular 
nuclei during Stage-REM, established that 
the frequency of nerve impulses rose sharply 
in two of the four nuclei (namely, the medial 
and descending) whenever a REM burst oc- 
curred. Pompeiano (1967) followed up this 
finding by ablating the medial and descend- 
ing vestibular nuclei (MVN and DVN). This 
resulted in a removal of the phasic phenomena 
of Stage-REM. There were no REMs. The 
phasic PGO spikes (very high amplitude 
spindles recorded from pons, lateral genicu- 
late nucleus—LGN—and occipital cortex— 
OC) were also removed. A somewhat lower 
voltage, tonic patern of PGO waves, character- 
ized by greater rhythmicity, remained. Thus 
the pons and other structures show function- 
ally the presence of separate tonic and phasic 
pathways. 

In a careful investigation of PGO activity, 
Bizzi and Brooks (1963) obtained evidence 
that the pontine response might cause the 
spiking observed in the higher visual system 


"i 


structures (LGN and OC), but the activity 
of these structures could not, in turn, trigger 
the pons. This unidirectional effect indicates 
that activity in the MVN and DVN (a) 
causes REMs and (b) either causes, or is 
closely related to the causes of, the fast, low- 
voltage, desynchronized activity in the higher 
levels of the visual nervous System. This in 
turn suggests that the tonic-phasic dichotomy 
may be more relevant to an experimental 
clarification of dreaming than the Stage- 
REM-NREM dichotomy. 

Two important implications of rapid eye 
movements can be considered. First, REMs 
represent an adjustment of a sensory ap- 
paratus. Second, this adjustment is a phasic 
activity associated with Stage-REM sleep. 
There is no reason to doubt that the REM 
activity of the visual system is paralleled by 
phasic adjustments in other sense modalities, 
such as hearing. Evidence Supporting this 
line of reasoning was provided by Dewson, 
Dement, and Simmons (1965). Recordings 
from two cats with electrodes implanted in 
the middle ear musculature, along with EEG 
and neck muscle EMG electrodes, yielded 
the direct evidence that middle ear mu 
also show phasic activati 
bursts. When tested with 
other cats with electrodes 
the inner ear showed a reduc 
plitude of electrical 
cochlea which were co 
bursts. This attenuatio 
crophonic is partly du 
middle ear muscles; se 
and Morlock (1962). 
the indirect inner ear 
with the direct middle 
middle ear musculature 
extra-ocular musculatur 
each phasically adjus 
receptor. Moreover, b 


scles 
on during REM 
pure tones, two 
implanted near 
tion in the am- 
responses from the 
ncomitant with REM 
n of the cochlear mi- 
€ to the contraction of 
e also Williams, Tepas, 
It is now evident that 
evidence is consistent 
ear evidence, Clearly, 
behaves much like the 
e during Stage-REM, 
ting a specific sense 
oth of these reactions 
originate in the brainstem. The motor nuclei 


of the fifth and seventh cranial nerves send 
efferent fibers to the middle ear muscles, just 
as the vestibular nuclei send motor impulses 
to the extrinsic muscles of the eyes. 

A consideration of the efferent (motor) 
systems yields observations which are quite 
analogous to the afferent system phenomena 
described above. Our Present knowledge of 
the behavior of these efferent systems is 
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largely the result of the efforts of Gassel M 
his colleagues. Gassel et al. (19642) -— 
the myoclonic twitches that occur during 
Stage-REM. These twitches represent a |. 
cial paradox within paradoxical sleep. DM 
the desynchronized EEG phase of sleep, t z 
is an overall loss of tone in the antigravity 
muscles, including the nucchal — a 
the nape of the neck which are often usec a 
criterion measurements of Stage-REM. De 
ing a REM burst, there is a further Joss a 
tone in these muscles, a phasic P 
superimposed upon the tonic one. E 
less, paradoxically, myoclonic twitches “a 
observed mainly during REM bursts, der. 
the presence of both tonic and phasic ' a 
hibition. Gassel et al. sought to discovel 
lesion that would eliminate the titohea 
their search, they established that the mu E. 
spindle-gamma efferent reflex arcs were vere 
involved in the twitches. Also eliminated W 
the vestibulospinal and pyramidal pcr E 
effective lesion turned out to be in the nin 
half of the lateral funiculus of the er 
Cord. It is known that reticulospinal ise 
travel through that region, and it is prec 
these fibers which appear to be Up J 
for the occurrence of REM bursts and "dl 
clonic twitches, Specifically, these fibers be a 
nate in the descending inhibitory TA js 
formation of the medulla. Therefore, 9. a 
the case with afferent phasic activin 
brainstem area is the source of the hasi 
fibers which are responsible for the t 
activation of the musculature during Sle 
Although the afferent and efferent ai 
show certain Similarities in their patter 
phasic activation (for example, ped a5 
pendence upon hindbrain structures as p es? 
their temporal concurrence), it nevert thei" 
can be shown that some differences E using 
patterns of phasic activation also exist. . the 
the reflex as a miniature unit of behavi0": a 
afferent and efferent fiber responses are’ 
joint electrical stimulation can be com? 
with one another, Studies by Gassel © 64) 
(1964b) and Hodes and Dement M 
demonstrated that phasic inhibition of 5 pi? 
reflexes occurs during REM sleep an 3 uc 
inhibition is attributable to brainstem $ h 
tures, Electrical stimulation of per!P 


nerves can be 
fi 


2 
ae 


" 


al 


t 
e 
e 
suprathreshold for ad 
bers and subthreshold for efferent 
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"This makes possible the electrical induction of 
reflex responses, and the method has the 
further advantage of providing a built-in 
measure of reflex thresholds. Hodes and De- 
ment (1964) and Gassel et al. (1964a, 1964b) 
have shown that the thresholds of such re- 
flexes are raised during Stage-REM. 

Gassel et al. (1964) observed a correlation 
between phasic neuromuscular phenomena 
and such autonomic functions as blood pres- 
sure and heart rate. The fall in blood pres- 
sure was not abolished by a sympathectomy 
of the relevant sympathetic ganglion, the 
cervical. This ruled out direct sympathetic 
stimulation as the cause of the vasomotor re- 
sponse. The experimenters attribute the pupil- 
lary dilation observed during REM bursts to 
inhibition of the parasympathetic activity of 
the Edinger-Westphal nucleus, which is in- 
volved in the regulation of eye movements. 
They reason, by analogy with this, that a 
similar phasic inhibition of parasympathetic 
vasomotor controls must cause the drop in 
blood pressure during REM episodes. 

The autonomic phenomena parallel those 
of the eye muscles, inasmuch as there is an 
overall tonic inhibition of blood pressure dur- 
ing REM sleep accompanied by transient 
changes occurring during REM bursts. Some- 
times a momentary increase of blood pressure 
and phasic tachycardia (heart-rate speedup) 
are observed at the very onset of a REM 
burst, but shortly after onset, the general ob- 
servation is one of further blood pressure 
reduction and bradycardia (heart-rate slow- 
down) (Gassel et al., 1964). 

With regard to the tonic and phasic phe- 
nomena of Stage-REM, there seems to be a 
special nucleus for the triggering of the tonic 
responses, namely, the LC, while several 
tracts and nuclei are responsible for the sev- 
eral varieties of phasic responses, namely, the 
MVN, the DVN, the vasomotor center, the 
reticulospinal tract, etc. In order to do justice 
to the complexity of sleep phenomena, the in- 
dividual mechanisms for individual sleep re- 
sponses must be clarified, as well as those 
mechanisms that are common to all of them. 
As Sprague put it, “it is not to deny the 
presence of functional localization to say 
that the integration of different attributes of 
a function is accomplished at many foci and 
levels of the central nervous system [Sprague, 


1967, p. 183].” It is precisely this considera- 
tion that makes it difficult to establish defini- 
tive centers for specific sleep phenomena. 


REM Deprivation 


What we have done thus far is to clarify 
and reorganize the literature on sleep and 
dreams which was used as support for the 
three-state model of psychophysiological ex- 
istence, which rested upon the dichotomiza- 
tion of REM and NREM sleep stages. We 
have attempted to show how these studies, if 
viewed in the light of more recent and sensi- 
tive research, tend to be accommodated more 
easily by the tonic-phasic model. This trend 
can be shown perhaps most dramatically by a 
recent deprivation study by Ferguson, Hen- 
riksen, McGarr, Belenky, Mitchell, Gonda, 
Cohen, and Dement (1968). 

It has long been known that Stage-REM 
deprivation, caused by awakenings at the 
onset of Stage-REM result in marked be- 
havioral changes and may eventually even 
bring about the death of the animal (De- 
ment, 1965; Dement, Henry, Cohen, & Fergu- 
son, 1967; Jouvet, 1965). The subsequent 
recovery of lost REM time (*REM-rebound") 
(Dement, 1960, 1965; Jouvet, 1965) was 
taken as proof of an organismic need for 
Stage-REM sleep (Hartmann, 1967; Jouvet, 
1965). 

A recent study by Ferguson et al. (1968) 
has demonstrated that phasic events which 
are indigenous to all stages of sleep seem to 
be the crucial element in the need for Stage- 
REM sleep (Ferguson et al., 1968). Stable 
base-line data were taken from five cats. 
Each cat was used as his own control in a 
counterbalanced design, and selectively de- 
prived of either Stage-REM sleep or of PGO 
spikes which occurred both in REM and 
NREM sleep stages. (Six days of recovery 
were used between treatments.) 

Those subjects who were REM deprived 
later showed less REM-rebound than the 
subjects deprived specifically of PGO spikes. 
In both kinds of deprivation, the rebound of 
REM sleep contained more than the usual 
number of phasic events, showing that there 
is a trend toward quantitative compensation 
for lost phasic activities. If enough PGO spikes 
were permitted to occur during NREM sleep, 
there was zo makeup of REM sleep at all. 
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(This may account for some of the “excep- 
tional" subjects who failed to show REM-re- 
bound after REM deprivation as observed by 
Dement, 1960). The results clearly demon- 
strate that the crucial factor in REM depriva- 
tion-compensation lies in the loss of phasic 
events. It may even be the case that REM 
time per se is irrelevant (Ferguson et al., 
1968). 


Discussion 


The tonic-phasic model accommodates data 
which cannot be explained easily by the 
three-state model. Specifically, it covers these 
phenomena: (a) the heightened variability of 
a number of mental and physiological mea- 
sures within the Stage-REM period, and (5) 
the continuity which some of these measures 
show throughout sleep. The new model seems 
to be a more productive guide for researchers 
in the field of sleep and dreams. The three- 
state model was facilitative inasmuch as it 
generated a great deal of research. It served 
as a useful “orienting response” in the early 
stages of the development of an uncharted 
and exciting field. In the years which have 
passed since the original discovery of Aserin- 
Sky and Kleitman (1953), tremendous quan- 
tities of data were accumulated and impres- 
sive advances were made in recording and 
scoring techniques, permitting new and more 
effective conceptualizations to evolve. It is. 
our belief that the emergence of the tonic- 
phasic model heralds a trend toward greater 
specificity in the focus of attention of sleep 
and dream researchers. 
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Stimulus selection in paired-associate learning is assumed to be the result of 
an active organized process, From this standpoint, a methodological analysis 
and review of the studies of stimulus selection are presented. Some sugges- 
tions are made for further studies of stimulus selection. 


Underwood (1963) called attention to the 
fact that subjects often select part of the 
nominal stimulus as the functional stimulus. 
He discussed some of the implications of 
stimulus selection and suggested that selection 
occurs in rote paired-associate learning as well 
as in concept formation. 

Shepard (1963), in his comments on 
Underwood's paper, maintained that some 
stimuli (e.g, nonsense trigrams and geo- 


metrical figures) are almost inevitably 
analyzed into components or dimensions, 
while others (e.g, colors and olfactory 


stimuli) are almost invariably reacted to as 
unanalyzable wholes. He distin- 
guished between selective attention or ab- 
straction which applies to the former type 
of stimuli and pure stimulus generalization 
which applies to the latter. Shepard also sug- 
gested that functional stimuli, in some cases, 
are not merely constructed from the in- 
dividual nominal stimuli but, in addition, are 
organized into an array or “cognitive map.” 

These discussions emphasized the fact that, 
even during paired-associate learning, human 
subjects are not passive receivers of stimuli, 
but are active selectors and organizers. When 
presented with a complex stimulus, subjects 
abstract part of it for use as the functional 


1 The author is grateful to Benton J. Underwood 
for his comments on an earlier draft. 

? Request for reprints should be sent to Jack 
Richardson, Department of Psychology, State Uni- 
versity of New York, Binghamton, New York 
13901. 


73 


4 
stimulus. Martin (1968) has extended the 


concept of cue selection and Shepard’s (1963) 
suggestion that meaningfulness is probably 
highly correlated with degree of analyzability 
of the stimuli to an encoding variability hy- 
pothesis. He argued that there are more 
alternative encoding possibilities for low- than 
for high-meaningful stimuli and that the sub- 
jects’ use of these encoding possibilities under- 
lies stimulus meaningfulness effects in paired- 
associate learning and also determines what is 
available for transfer. 

Trabasso and Bower (1968) have recently 
reviewed the research on cue selection in con- 
nection with an extended series of studies on 
discrimination and classification learning. The 
purpose of the present paper is to concentrate 
on a methodological analysis and review of 
studies of stimulus selection in paired-as- 
sociate learning. 


iN 
Cur ErrrerivENEsS, COMPON =NT-RESPONSE 
DIFFICULTY, AND CUE SELECTION 


Both Underwood (1963) and Shepard 
(1963) discussed stimulus selection as an 
activity on the part of the subject. If stimulus 
selection is to be considered bzsig in more 
complicated tasks, such as transfer and con- 
cept formation, then it must be the result of 
an active process in the sense that it must 
be flexible and adaptable enough to change 
with the demands of the task and with 
the results produced by the selection of a 
particular component of the stimulus. For 
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instance, Martin's (1968) explanation of the 
effects of stimulus meaningfulness on negative 
transfer requires that the subject select one 
component of the low-meaningful stimulus 
during first-list learning and a different com- 
ponent, one less subject to negative transfer, 
during second-list learning. 

In most studies of stimulus selection, the 
procedure consists of presenting a list with 
compound stimuli for paired-associate learn- 
ing and then testing for recall and/or transfer 
to the components of the compound. The 
specific questions under consideration have 
not always been clear in these studies. The 
study by Underwood, Ham, and Ekstrand 
(1962) is presented in some detail as an 
example of a stimulus selection study because 
it has apparently been a model for several 
other studies, and most of the problems of 
interpretation were considered by the authors. 

In the Underwood et al. (1962) study, 
paired-associate lists were presented by the 
anticipation method at a 2:2-second rate to a 
criterion of 7/7 correct. In one list, the 
compound stimuli were consonant-vowel-con- 
sonant (CVC) words with each word sur- 
rounded by a distinctive color frame, and 
the responses were the single digit numbers 2 
through 8. In the other list, the trigrams 
were consonant-consonant-consonant ( CCCs) 
of high formal similarity: the colors and the 
color-number pairings remained the same in 
both lists, Following learning, the transfer 
lists were presented for 10 trials, and sub- 
jects were instructed to give as many correct 
responses as possible on the first transfer trial. 
The transfer lists had either the compounds, 
the colors, or the appropriate trigrams as 
stimuli, and each stimulus was paired with 
the same response as in the first list. Separate 
groups of subjects were assigned to the six 
conditions, and they were informed that the 
colors and trigrams would consistently appear 
together in the first list. Prior to the transfer 
list, the subjects were told about the stimuli 
which would be presented on these trials. 

The word-color list required fewer mean 
trials (8.67) to attain the criterion of one 
perfect trial than the CCC-color list (10.52). 
Learning to a criterion of equal performance 
on the compound lists apparently implies that 
the major concern is the amount of decrement 
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on the recall trials and that cue effectiveness 
of the components should be considered in re- 
lation to the effectiveness of the compound. 
Underwood et al. were not directly concerned 
with the difficulty of learning the compound- 
number pairs. However, the interpretation of 
these differences is relevant to stimulus selec- 
tion and is considered further in relation to 
studies comparing the learning of single- and 
multiple-stimulus lists. 

The mean numbers of correct responses on 
the first transfer trial, following learning of 
the word-color list, were 6.3, 5.0, and 3.8 to 
the word-color compound, word, and color 
stimuli, respectively. The mean correct re- 
sponses following learning of the CCC-color 
list were about (estimated from Figure 1) 
5.5, 2.6, and 5.5 for the CCC-color com- 


pound, CCC, and color stimuli, respectively. - 


In the present paper, the empirical estimate 
of learning is called cue effectiveness. "Thus 
the word-color compound was a more effective 
cue than the CCC-color compound. Note that 
cue effectiveness is measured by the amount 
of recall to the cue, either at the end of learn- 
ing or on an immediate recall. Comparison of 
cue effectiveness requires the same oppor- 
tunity to learn. In the Underwood et al. 
study, where the two lists were presented 
different numbers of trials, cue effectiveness 
of the different components in the two com- 
pound lists cannot be compared directly. The 
conclusion about the relative effectiveness of 
the compounds is possible only because the 
word-color compound produced more cor- 
rect responses after fewer learning trials. 
Within the word-color list, the compound was 
most effective and the color the least effective 
cue. In the CCC-color list, the compound and 
color were equally effective and the CCC less 
effective. 

Underwood et al. pointed out that color, 
in the CCC-color condition, was as effective 
a cue as the CCC-color compound and stated 
that this precludes an interpretation of the 
functional stimulus being a configuration. Al- 
though this is probably true with CCC-color 
compounds, it does not seem that the evi- 
dence presented forces this conclusion. It !5 
possible that equal effectiveness of a com- 
ponent and the compound indicates that the 
component is sufficient to reinstate the com- 
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pound, and thus the two are equally effective 
for different reasons (e.g., the first letter of 
a word may elicit the word which in turn 
elicits the response). The possibility that a 
component elicits a correct response because 
of mediation by another component was con- 
sidered, but apparently not the possibility of 
mediation by the compound. 

In discussing the differences between the 
cue effectiveness of the components, Under- 
wood et al. considered possible differences in 
difficulty of learning when the components 
were not part of a compound. Three new 
groups of subjects were run, and they learned 
either the CCC-number pairs, the word-num- 
ber pairs, or the color-number pairs. The 
mean correct responses in 10 trials were 38.80, 
51.20, and 50.40, respectively. As a result of 
the lack of difference in the difficulty of the 
word-number list and the color-number list, 
the authors concluded that the greater trans- 
fer to the words, as compared to the colors, 
may be due to a subject bias toward dealing 
with verbal materials. On the other hand, the 
greater transfer to the colors compared to the 
CCCs was assumed to be due to the higher 
meaningfulness of the colors. This implies 
that cue effectiveness may be due to two 
quite different things: the component-response 
difficulty and the cue selection by the subject. 
This distinction has been made by Houston 
(1967) and, although discriminability is not 
obviously involved in most cases, it seems 
analogous to the Tmai and Garner (1965) 
distinction between preference and discrimin- 
ability in classification tasks. The difficulty 
of learning to give the response to the com- 
ponent would contribute to cue effectiveness 
even if selection were perfect, that is, even 
though the subject could effectively ignore 
all other components. The cue selection is 
considered an activity on the part of the sub- 
ject and, although it is to be expected that 
component-response difficulty would have an 
effect on cue selection, there are variables that 
affect cue selection and do not affect com- 
ponent difficulty. Cue effectiveness of a com- 
ponent is considered to be the net result of 
cue selection and component-response diffi- 
culty. 

Underwood et al. (1962) also considered 
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with the word and color components which 
had been demonstrated to be equal in com- 
ponent difficulty. The words were the more 
effective stimuli, but this could have been 
due either to more subjects using words as the 
functional stimuli for all pairs or to subjects 
using words as functional stimuli for more 
pairs—that is, each subject could have con- 
sistently used one type of component as the 
functional stimulus with differences in sub- 
jects as to which type of component they 
used, or individual subjects could have used 
different types of components as the func- 
tional stimulus for different pairs. In the 
present paper, the tendency to select the same 
type of component as the functional stimulus 
for each pair is called consistency of cue selec- 
tion. This may refer to subject consistency or 
to group consistency. Each individual subject 
could be completely consistent in selecting 
one type of component as the functional cue 
and could be less consistent when considered 
as a group. Of course the group cannot more 
consistently select one type of functional 
stimulus than the individual subjects. 

There seems to be one additional distinc- 
tion necessary for discussion of studies of 
stimulus selection in paired-associate learning 
(Harrington, 1969). Consider the case of a 
subject who consistently uses one type of 
component as the functional stimulus in that 
all components of that type elicit the correct 
response following learning with the com- 
pound stimuli. However, the other type of 
component may also elicit some correct re- 
sponses. If the possibilities of mediated recall 
and guessing can be eliminated, the subject 
has in some cases learned the response to each 
component of the compound. This is closely 
related to individual consistency of cue selec- 
tion but leaves the possibility that cue selec- 
tion may be consistent without being perfect. 
This is called the efficiency of cue selection 
and refers to the possibility that to some ex- 
tent more than one component of a com- 
pound may become functional. This may be 
due to alternation of attention, to division of 
attention, or to a search for the best cue, but 
efficiency is specified empirically by the 
probability that only one component from a 
compound will become a functional stimulus. 
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The previous discussion was an attempt to 
distinguish among cue effectiveness, com- 
ponent-response difficulty, cue selection, con- 
sistency of cue selection, and efficiency of cue 
selection. These are empirical terms and are 
not intended to be explanatory except in the 
Sense that operationally specifying different 
empirical phenomena which are sometimes 
confounded is explanatory. Consider 7 trials 
of paired-associate learning at a specified 
rate, intertrial interval, etc.; the stimuli are 
compounds, response learning is minimized, 
there is no relationship or association among 
the stimuli and responses prior to learning, 
and each stimulus item is assigned a different 
response. Cue effectiveness refers to the 
empirical probability that the compound or 
components will elicit the correct response 
following learning, Component-response diffi- 
culty refers to the empirical probability that 
a specified type of component elicits the cor- 
rect response when it is used as the nominal 
stimulus in the same situation, that is, the 
same response pairing, rate, number of trials, 
etc. Cue selection refers to the difference be- 
tween cue effectiveness and component-re- 
Sponse difficulty. If cue selection is perfect 
then cue effectiveness and component diffi- 
culty should be equal, that is, learning to the 
component in the compound should be equiva- 
lent to learning to the component alone, Tf 
there is no selection of a component (i.e, it is 
i in the compound), there 
should be no learning to the component. Thus 
any learning to a component in a compound 
demonstrates some cue selection. However, the 


ponent-response difficulty in each case, 

In a comparison 
ponents, the one with the greatest amount of 
cue selection is saiq 
dominant component, 
of cue selection are considered to have two 
aspects, Consistency selection refers 
to amount of the difference in cue selection 
and is determined by the same operations as 
used to determine which is the dominant cue, 
Efficiency of cue Selection refers to the tend- 
ency for one component to become a func. 


tional stimulus without another component of 
the compound becoming a functional stimulus 
at the same time, Note that the efficiency of 
cue selection can be determined only if each 
subject recalls with each component of the 
compound as a cue, In the case where the 
components cannot be classified in any way, 
consistency of cue selection does not apply to 
the individual subject, but efficiency of cue 
selection can still be determined, Efficiency 
of cue selection sets the upper limit of con- 
sistency of cue selection 


PAIRED-AssoCIATE LEARNING WITH 
COMPOUND STIMULI 


In studies of stimulus selection, the com- 
ponents of the compound are usually re- 
dundant and relevant, Since any component 
of the compound can logically become the 
functional stimulus for the response, it might 
be expected that subjects would select fhe 
most effective component. Due to individual 
differences in which component is the most 
effective, this should result in a tendency for 
a list with compound stimuli to be less diffi- 
cult than a list with any of the single com- 
ponents as stimuli, This seems to be the 
case in classification tasks with redundant 
relevant cues (Trabasso & Bower, 1968). 
However, with Verbal stimuli—in this case, 
printed combinations of letters presented visu- 
ally—a paired-associate list with compound 
stimuli tends to be more difficult than a list 
with the least difficult components as stimuli. 

Paired-associate lists with single word 
CVCs as stimuli tend to be less difficult than 
with compounds of word and low-meaningful 
CVCs (Cohen & Musgrave, 1964); single 
low-similarity CVCs tend to be less difficult 
than compounds of low- 
CVCs (Cohen 1966): single 
CVCs are less difficult than double (Greeno 
& Horowitz, 1968); and single-word stimuli 
are superior to multiple-word stimuli (Ber- 
lyne, Borsa, Craw, Gelman, & Mandell, 1965; 
Horowitz, Lippman, Norman, & McConkie, 
1964; Musgrave & Cohen, 1964: Pan, 1926). 

s und that color stimuli are 
superior to compounds of colors and CVCs 
en (Baumeister & Berry, 
), to compounds of colors and CCCs 
(Solso, 1968a), ang tend to be superior to 
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compounds of colors and words (Baumeister 
& Berry, 1968; Solso, 1968a). Sundland and 
Wickens (1962) compared words with 
color-word compounds and CVCs with color- 
CVC compounds and the compounds tended 
to be superior, but the differences were very 
small and not consistent. In one case (Hill & 
Wickens, 1962), compounds of colors and 
CVCs were superior to either colors or CVCs 
alone, and in one case (Saltz, 1963) color- 
word compounds were superior to either 
colors or words. It is not obvious why these 
two studies produced results different from 
the others. In general, a compound-response 
list is more difficult than a component-re- 
sponse list. When a compound consists of two 
components which differ in difficulty, the 
compound-response list tends to be more 
difficult than the less difficult component-re- 
sponse list, but less difficult than the more 
difficult component-response list. Or, to say it 
another way, the difficulty of a compound 
list is influenced by the difficulty of both 
components; the compound-list learning does 
not suggest that there is complete selection of 
either component. If one component is the 
same in two compounds, the compound diffi- 
culty varies directly with the difficulty of the 
second component (Cohen & Musgrave, 1964, 
1966; Underwood et al., 1962) or there is no 
significant difference (Young, Teeters, & 
Zelazny, 1966). This does not support the 
rather reasonable assumption that the con- 
sistency of selection of the less difficult com- 
ponent increases as the difference between the 
difficulty of the components increases. 

Tt is sometimes suggested that the difficulty 
of a compound-response list is produced by 
the necessity to integrate or unitize the stim- 
ulus compound. Horowitz et al. (1964) found 
that a triad of associated words used as stimuli 
produced faster learning than a triad of un- 
related words, but a list with single words 
as stimuli was less difficult than either. Liftik 
and Leicht (1968) found somewhat similar 
results. Greeno and Horowitz (1968) used 
two pretraining procedures with CVC com- 
pounds in an attempt to integrate the com- 
pounds prior to using them as stimuli in a 
paired-associate list. Not only were single 
CVCs superior to the pretrained compounds, 
but the pretraining on the compounds inter- 


fered with learning paired-associate pairs with 
the compounds as stimuli. The authors con- 
cuded that the difficulty produced by com- 
pound stimuli is not due to the learning time 
required for integrating parts of the com- 
pound and suggested that there may be poorer 
discriminability among compound stimuli. 


What Is Learned? 


It is not clear why subjects should spend 
time attempting to unitize two unrelated 
components in a task which permits selection. 
There are other possibilities for learning 
which may not contribute directly to the per- 
formance on a compound-response list. If two 
components of a compound differ in difficulty, 
then the time the subject spends learning the 
response to the less difficult component would 
make the task comparatively easier, and the 
time spent learning to the more difficult com- 
ponent would make the task more difficult. 
If the subject shares the time between the 
components in some way, then the compound 
difficulty should lie between that of the two 
components, This agrees with the evidence 
presented thus far, but does not explain why 
a compound is more difficult than a com- 
ponent when the components are equal in 
difficulty, for instance, word-word compounds. 
It is possible that subjects spend appreciable 
time attempting to work out a rule for selec- 
tion when components are of equal difficulty, 
or that alternation or division of attention 
interferes with learning component-response 
connections. 

However, there are other possibilities. 
There is no evidence for summation when the 
same response is learned separately to two 
stimuli. If the two stimuli are presented as a 
compound, the probability of a correct re- 
sponse to the compound is approximately that 
expected from the combined probabilities of 
a correct response to the components (Hill & 
Wickens, 1962; Youssef & Dosey, 1968), and 
the latency of the response to the compound 
is longer than that to the components (Mus- 
grave, 1962; Musgrave & Cohen, 1964; Mus- 
grave, Cohen, & Robbins, 1967; Musgrave, 
Goss, & Shrader, 1963; Shepard & Fogel- 
songer, 1913). Thus, to the extent two com- 
ponent-response connections are learned in- 
dependently in a compound-response list, 
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there is the possibility that the increased diffi- 
culty is due to the lack of efficiency of cue 
selection. If one component-response connec- 
tion is learned then the learning of the other 
component-response connection in that com- 
pound would contribute nothing to perform- 
ance. 

There is also evidence for component-com- 
ponent learning in a compound-response list 
(Birnbaum, 1966; Davis, Brown, & Ritchie, 
1968; Leonard & Jacobus, 1969; Postman & 
Greenbloom, 1967; Spear, Ekstrand, & Under- 
wood, 1964). Steiner and Sobel (1968) pre- 
sented evidence for a direct association, not 
due to the common response, with color-word 
compounds. The evidence for component- 
component learning makes it possible that 
part of the response recall to components 
may be due to mediation by other com- 
ponents from the compound and not to a 
direct component-response connection (Under- 
wood et al. 1962). 
bloom (1967) argued that Stimulus selection 
can be demonstrated only by the use of a 
dual criterion to demonstrate a direct com- 


jd “Pronounce 
CVC-digit pairs learned to a criterion of 6/6 


Correct missing letters, 
greater probability of recall to these 
letter cues than to those which had not 
elicited any stimulus letters, Although the 


€ correlation. between 
trials to criterion and the number of stimulus 
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bloom pointed Out, successive recall mea- 
surements may introduce some difficulties, 

Whatever the Specific mechanisms under- 
lying the difference in compound and single- 
stimulus difficulty, a cue selection type of 
hypothesis requires that the difference in 
difficulty decrease as efficiency of cue selec- 
tion increases. This should serve to distinguish 
it from stimulus integration and stimulus dis- 
criminability hypotheses. There are probably 
cases where the stimulus components are 
integrated during learning, for instance, high- 
meaningful CVCs, However, some type of 
cue selection hypothesis seems plausible when | 
the components are unrelated and the com- 
pound not easily represented by a unitary 
response, 


STUDIES or Cug EFFECTIVENESS 


In cases where there were fairly obvious 
differences in component difficulty, it has been 
found that the less difficult component is the 
more effective cue following compound-re- 
sponse learning, This has been demonstrated 
for components differing in formal similarity 
(Cohen & Musgrave, 1966; Newman & 
Taylor, 1963), in meaningfulness (Cohen & 
Musgrave, 1964; Lovelace, 1968; Spear et al., 
1964) and in distinctiveness (Jacobus & 
Leonard, 1968). The comparison of the com- 
pound with the components is more interest- 
ing. In a list with compounds consisting of 
colors and high-similarity CCCs, the com- 
pound is not a more effective cue than the 
color (Houston, 1967; Jenkins & Bailey, 
1964; Underwood et al, 1962), and words 
have been found as effective as word-color 
Compounds (Sundland & Wickens, 1962), 
However, in other studies the compound has 
been a more effective cue than any Single set 
of components, 

Presumably the major interest in c 
ing the cue effectiveness. of CO, 
components 


‘ompar- 


€ to a com onent 
should occur only if the Subject can a 
the compound from the co s 


of the components are 
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and the compound, but the interpretation of 
differences in effectiveness becomes more 
difficult. To the extent that the compounds 
are the functional stimuli, the compounds 
should be more effective cues than the com- 
ponents. To the extent that the cue selection 
occurs and is not efficient, the compounds 
should be less effective than the components. 
If one component of a compound is a func- 
tional cue, this produces a recall to the com- 
pound and a recall to the component; if both 
components in a compound are functional 
cues, this produces one recall to the compound 
but two recalls to the components. From this 
standpoint it is necessary for the same sub- 
jects to recall with each component as a cue. 
This should increase the number of correct 
guesses, compared to the recall to only one 
set of cues, but permits the determination of 
the number of different responses recalled 
comparable to the recall to the compound. It 
is also comparable in the sense that each sub- 
ject has an opportunity to recall to each 
component. 

Thus, it seems necessary for subjects to 
recall to each component in order to deter- 
mine the extent to which the compounds are 
functional stimuli, Postman and Greenbloom 
(1967) compared recall to CVC compounds 
with recall to the single letters. They found a 
mean recall of 5.72 to hard-to-pronounce 
CVC compounds and a mean of 5.17 different 
responses recalled by the single-letter groups 
which recalled with each letter as a cue. The 
extent to which mediated recall and guessing 
decreased the differences is not known, but it 
seems safe to conclude that there were more 
correct recalls to the compound than would 
be expected from the recall to the components. 
Presumably this means that the compounds 
were functional stimuli in some cases. There 
should be more studies comparing cue effec- 
tiveness of compounds and components. 


STUDIES oF CUE SELECTION 


Since cue selection is considered the result 
of an activity on the part of the subject, there 
may be many fairly subtle task and in- 
structional variables which influence the sub- 
ject's strategy and determine the amount of 
selection of a cue from a compound. Deter- 
-mination of the amount of cue selection and 
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of the consistency of cue selection requires 
measurement of component difficulty. Mea- 
surement of the efficiency of cue selection 
requires recall by the same subjects to each 
of the components to be compared. The ex- 
isting studies of cue selection have usually 
either not included all the required measure- 
ments or have not presented the data in a 
fashion to permit computations of cue selec- 
tion, consistency, or efficiency. The com- 
ponent-response difficulty is usually not mea- 
sured, but there are several designs which 
have been used and which permit statements 
about relative amounts of cue selection and 
consistency. These designs have been grouped 
into types corresponding to the type of in- 
formation that can be obtained írom the 
design. It should be noted that measurement 
of the efficiency of cue selection requires only 
that the recall to the components of a com- 
pound be a within-subject variable and is 
independent of the type of design specified 
here. 


One Component Constant 


If the same component is included in two 
different compounds, the difficulty of the com- 
ponent may remain constant. A comparison of 
the effectiveness of the common component 
in the two compounds permits a statement 
about the relative amount of selection of the 
component. 

Compounds with a common component. 
Underwood et al. (1962) used color-CCC and 
color-word compound stimuli with digit re- 
sponses. After learning to a criterion of 7/7 
correct, colors were more effective cues when 
they had been paired with CCCs to form the 
compound than when they had been paired 
with words. The color-CCC list required more 
trials to learn to criterion so the differences 
may be overestimated. However, the results 
suggest that cue selection of a component in- 
creases as the difficulty of the other pos- 
sibility increases, and two other studies sup- 
port this conclusion. Young, Farrow, Seitz, 
and Hays (1966) used a similar design but 
tested for backward recall following learning 
of the compound lists. The results were com- 
parable: more colors were recalled when they 
had been included in a compound with low 
association-value CVCs than in a compound 
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With high association-value CVCs. Sundland 
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and Wickens (1962) held the number of 
learning trials constant and found colors were 
more effective cues after learning with a 
color-CVC compound than after learning with 
a color-word compound. 

Prior learning with onc component. Hous- 
ton (1967) used a list consisting of eight 
digit-color compound stimuli with adjective 
responses. Following six learning trials, the 
subjects recalled with either the colors, the 
digits, or the compounds as cues, The major 
independent variable was the type of training 
prior to learning 
This consisted of 


and either adjective responses unrelated to 
(the A-B, A.C 
negative transfer paradigm with respect to 
the pretrained component) or adjective re. 
the correspond- 
ing responses in the compound list (the A-B, 
A-B’ positive transfer paradigm), The effec- 


each type of pre- 
training within types of recall cues Was not 
reported, However, the nonpretrained com- 
ponent was a more effective cue for recall 
When the pretrained components had been 
used as stimuli in the negative transfer para- 
digm (X = 4.38) than when they had been 
used as stimuli in the positive transfer para- 
digm (X — 2.88). The pretrained components 
Were more effective cues When they had been 
used in the positive transfer paradigm (X — 
5.88) than when they had been used in the 
negative transfer paradigm (X = 4.88). As 


it cannot be deter- 
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marily the result of the positive transfer 
condition rather than the negative transfer 
condition. This Suggests that selection of a 
component may be reduced by prior learning 
Which makes another component a more ef- 
fective stimulus, but may be changed very 
little by prior learning which produces inter- 
ference with the other component, 


Equal Com ponent-Response Difficult y 


If the components of a compound are 
demonstrated or assumed to be equal in diffi- 
culty, the comparison of the effectiveness of 
the components permits a statement about 
the relative amount of selection, 

Underwood et al. (1962) used color-word 
compounds as stimuli and demonstrated, in 
a fashion which did not permit computation 
of the amounts of cue selection, that the 
difficulty of the color-response list and the 
word-response list was equal. Words were 
more effective cues for recall than colors, so 
it was demonstrated that words were selected 
more than colors during the compound list 
learning. 

There have been Several studies in which 
the components were classified according to 
the position in the compound. Since the same 
type of stimuli appear in each. position, it is 
assumed that the component difficulty does 
not vary among Positions. In studies Which 
have used trigrams as compounds and tested 
to the single letters, it has been fairly con. 
sistently found that the letter in the first 
position was dominant. With CCCs as com- 
pounds, it has been found that the first letter 
is the most effective cue, either with a tend- 
ency for the middle letter to be least effec. 
tive (Jenkins, 1963; Richardson & Chisholm 
1969) or with little difference between the 
effectiveness of the last two letters (Lovelace 
& Blass, 1968: Rabinowitz & Ro 
Rabinowitz & Witte, 1967). The f 
is selected. from a CVC 
1968: Postman & 
with three-letter words the midd 
least effective while the first and 
are about equally effectiy 
1968). The results with ice luos » Biss 
pounds have not been i 


Consistent, 
found that the left Cyc Gace weve erat 
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(Cohen & Musgrave, 1964, 1966; Lovelace, 
1968). However, the position of the com- 
ponent in the CVC-word compound ap- 
parently had no effect (James & Greeno, 
1967), and although it has been found that 
the left word in a word-word compound tends 
to be slightly more effective, the differences 
have not been significant (Cohen & Musgrave, 
1964; Harrington, 1969). 


Variables Which Do Not Affect Component- 
Response Difficulty 


If it is reasonable to assume that the in- 
dependent variable does not change the diffi- 
culty of the components, any change in the 
cue effectiveness of a component permits a 
statement that the amount of selection has 
changed in the same direction. If, in addition, 
the components are equal in difficulty, a 
change in the relative effectiveness of two 
components permits a statement about a 
change in relative consistency of cue selection. 

Instructions. In some of the early studies 
of context (e.g., Weiss & Margolius, 1954) 
the instructions to the subject omitted any 
reference to the context cues while in some 
later studies of stimulus selection (e.g., 
Underwood et al., 1962) the instructions care- 
fully described the components and their 
relationships. In spite of the potential im- 
portance of instructions to the subject in de- 
termining what is learned, several published 
studies have not been explicit about the in- 
structions given to the subjects. Apparently 
only one study has varied instructions in a 
direct attempt to change cue effectiveness. 
Houston (1967) presented a list with color- 
CCC compound stimuli and digit responses. 
The subjects were instructed that they would 
later be tested with the CCCs alone, with 
either colors or CCCs, with the colors alone, 
or they were given no specific instructions 
about a test. After learning to a criterion 
of 7/7 correct, the subjects recalled and 
relearned with either the compounds, the 
colors, or the CCCs, as stimulus items. Al- 
though there were no significant differences 
in recall to the compounds or in recall to the 
colors as a function of instructions, the recall 
to the CCCs varied as would be expected 
from the instructions. The group instructed 
that they would be tested with the CCCs re- 


called the most correct responses to the CCCs. 
The number of correct responses to the CCCs 
decreased as instructions were changed from 
test with CCC, test with either, no instruc- 
tions, and test with color. Although the cue 
effectiveness of the CCCs increased as sub- 
jects were led to expect their use alone in a 
recall test (and this may indicate increased 
selection of the CCCs), the interpretation of 
the results is not clear. The instructions also 
increased the number of trials to learn in the 
same fashion as the number correct on CCC 
recall. It remains possible that part of the 
increased recall was a function of the addi- 
tional learning trials prior to recall. How- 
ever, comparison of the increased recall with 
increased numbers of learning trials in an- 
other experiment of the same study sug- 
gests that this was a minor factor. In a retro- 
active inhibition (RI) paradigm, Schneider 
and Houston (1968) used trigrams as stimuli 
in the first list and the same trigrams com- 
bined with colors as stimuli in the second list. 
Telling subjects that they would be tested 
on the trigrams in the second list increased 
RI. 

Instructions to the subject probably deter- 
mine, to a great extent, what is learned 
within any given presentation procedure and 
warrant more investigation. However, in- 
structions may be much more effective in 
situations where there are comparatively small 
differences in component difficulty. It may be 
difficult to get subjects to select the most 
difficult component of a compound. 

Color. In these experiments a color was 
used to emphasize one component in each 
compound. Since the color was the same in 
each compound, it is assumed that the color 
does not change the difficulty of the com- 
ponents. Tt permits an additional classifica- 
tion of the components which may be used 
as a rule for selection. 

Rabinowitz and Witte (1967) used seven 
CCC-digit pairs learned to a criterion of two 
successive perfect trials. Three lists were con- 
structed by printing either the first, second, 
or third letter of each CCC in red and the 
other two letters in black. A separate group 
of subjects learned each list and recalled with 
each of the 21 letters as cues. There were no 
significant differences in the trials required to 
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learn the three lists. The cue effectiveness of 
the letters varied with red-letter position, and 
there was an interaction of letter position 
and ‘color. With the red letter in the first 
position the mean number of Correct responses 
recalled to letters from the first, second, and 
third positions were 4.17, .91, and 1.25, 
respectively. The letter in the first position 
was dominant. With the red letter in the 
Second position, the mean number recalled 
were 1.75, 2.58, and .91 to letters from the 
three positions. Compared to the condition 
with red in the first position, selection of 
letters from the first and third position was 
less, and selection of letters from the second 
was more. The second letter was dominant, 
and the group consistency of selection was 
less. The mean recall when the red letter was 
in the third position was 2.25, 1.25, and 2.58 
to letters from the three positions. Changing 
the position of the red letters in the CCCs 
changed the amount of selection of letters 
from the three positions, changed the position 
of the dominant letters, and changed the 
group consistency of selection, A very similar 
study (Rabinowitz & Robe, 1966) gave 
similar results, 

Harrington (1969) performed two experi- 
ments and computed the efficiency of cue 
selection. In the first experiment the eight 
compounds consisted of three-letter words of 
100% association value combined with CVCs 
of 10% association value, The responses were 
digits. The position of the two types of com- 
ponents were counterbalanced, and there were 
three conditions: the words were emphasized 
by enclosing them in a yellow rectangle, the 
CVCs were emphasized in the same fashion, 
or neither component was emphasized. The 
lists were learned by the study-test method to 
a criterion of one perfect trial. The transfer 
list consisted of the 16 components, without 
any color present, presented for seven trials 
by the anticipation method. 

The list without color required more errors 
to criterion, but the differences were not 
significant. The differences between the num- 
ber of correct responses recalled to words 
and to CVCs varied with conditions, but there 
were more correct responses to the words in 
all conditions. Tt is reasonable to assume that 
the CVC-digit pairs were more difficult than 


the word-digit pairs so it can only be con- 
cluded that the amount of word and CVC 
selection varied as a function of conditions, 
The dominant component and the relative 
consistency cannot be specified. 

The components were classified as perfect 
and imperfect, with perfect being those to 
which responding was always correct through- 
out transfer. The two components in each 
compound were then classified as the word 
perfect, the CVC perfect, both perfect, or 
neither perfect. In the contingency table of 


perfect words (W) and CVCs (NS) and im- : 


perfect words (W) and CVCs (NS), the re- 
sulting frequencies for NS W, NS W, NS W, 
and NS W, respectively, were 3, 105, 5, and 47 
for the word-emphasized condition; 8, 46, 28, 
and 78 for the CVC-emphasized condition; 
and 15, 143, 26, and 136 for the no-emphasis 
condition. The proportion of all perfect items 
in the NS W cell was O51, .177, and .151 
for the word-emphasized, CVC-emphasized, 
and no-emphasis condition, respectively. This 
Seems reasonable in that emphasizing the 
more difficult of the two components pro. 
duced a decrease in efficiency and emphasiz- 
ing the less difficult produced an increase in 
efficiency of cue selection. 

The second experiment (Harrington, 1969) 


was similar to the first €xcept the compounds 
were word-word. 


learned the list with the 


n each compound emphasized 
by a yellow rectangle around the word. The 
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ponents from the 
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no-color, so the dominant type of component 
and the consistency of cue selection. can- 
not be compared in the two conditions. The 
measurement of efficiency of cue selection is 
independent of the particular classification of 
the components. 

Prior learning with different compounds. 
This method is based on transfer of cue selec- 
tion when there are no components common 
to the two tasks. Richardson and Chisholm 
(1969) used CCC-digit pairs in the two lists. 
In first-list learning, use of the letters in 
one of the three positions as functional 
stimuli was forced by the high formal simi- 
larity of the letters in the other two positions. 
There were no replications of letters in the 
second-list stimuli, and, after learning, the 
individual letters from the second-list CCCs 
were presented as cues for recall. The cue 
effectiveness of the letters from the three 
positions was a function of the position of the 
functional stimulus letters in the first list. 
The letters which occupied the same position 
as the functional letters in the first list were 
the dominant letters in the second list. 


Measurement of the Difficulty of Components 


Sundland and Wickens (1962) presented 
data which permit the computation of the 
amount of selection of each component. There 
were 15 learning trials by the anticipation 
method with a test trial (only stimuli pre- 
sented) after learning trial 5, 10, and 15. 
The responses were nine three-letter words 
for all lists, and the five lists had different 
stimuli, The stimuli were either six-letter 
words (W), two-syllable hyphenated non- 
sense syllables of 796 association value (NS), 
colors (C), the words on the colored back- 
grounds (C + W), or the nonsense syllables 
on the colored backgrounds (C + NS). Fol- 
lowing the test trial after 15 learning trials, 
the compound groups (C + W and C + NS) 
were given two recall trials with the com- 
ponents. The order of the recall trials was 
C then W (NS) for half of the subjects and 
the reverse for the other half. The number of 
correct responses on the last test trial was 
7.72, 6.88, and 6.87 for the W, NS, and C 
list, respectively. There was poorer recall on 
the second of the two recall trials for com- 
ponents from the compound lists so only the 


first is used to compute cue effectiveness. 
The mean number of correct responses at 
recall (computed from the percentages in 
Table 3 of the Sundland and Wickens study) 
were §.33, 5.41, 1.24, and 3.95 for the W, 
NS, C from the C + W list, and C from the 
C+NS list, respectively. The estimates of 
cue effectiveness are divided by the estimates 
of the corresponding component-response diffi- 
culty, and the cue selection is 1.079 (8.33/ 
7.72), .786 (5.41/6.88), .180 (1.24/6.87), and 
575 (3.95/6.87) for the W, NS, C from 
C+W, and C from C+NS components, 
respectively. The words were dominant in the 
C + W list with a group consistency of .899 
(1.079 — .180), and the nonsense syllables 
were dominant in the C+ NS list with a 
group consistency of .211 (.786 — .575). The 
verbal stimuli were selected in each com- 
pound, but the consistency was much greater 
when the verbal items were words than when 
they were nonsense syllables. The amount of 
cue selection may seem somewhat large, but 
in this experiment the performance at the 
end of learning was slightly better on the 
compound list than on either of the com- 
ponent lists for both the C+ W and the 
C + NS compounds. 


Cue Selection as a Function of Amount of 
Training with the Compound 


The effect of amount of compound list 
learning on cue selection has important im- 
plications for the type of processes underlying 
the selection. If cue selection is the result of 
a subject’s strategy, then the consistency of 
cue selection should be low early in learning. 
After the subject has settled on a strategy, 
the consistency and efficiency of cue selection 
should increase. From the standpoint of the 
present analysis, an interaction of the degree 
of compound-list learning and the cue effec- 
tiveness of the components is not sufficient to 
demonstrate a change in cue selection unless 
the component-response difficulty is equal. Of 
course, even if the components are different 
in difficulty, an increase in the cue effective- 
ness of one component with no increase in 
the cue effectiveness of the other demon- 
strates increased cue selection. 

James and Greeno (1967) used word-CVC 
compound stimuli and digit responses. The 


7. 
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compound list was learned to a criterion of 
4/8, 8/8, or 8/8 correct plus 10 additional 
trials. Following learning with the compound- 
stimulus list, all subjects transferred to a 
list consisting of each of the 16 components 
presented separately and paired with the same 
responses as in the first list. With both the 
number of errors during five transfer trials 
and the proportion of items with zero errors 
in transfer, the transfer to the word-digit 
pairs increased as degree of first-list learning 
increased. The transfer to the CVC-digit pairs 
was the same for the two lower degrees of 
learning, 4/8 and 8/8, but increased in the 
group which overlearned the first list, These 
results were confirmed in a second experiment, 
and a third experiment demonstrated that 
transfer to the CVC-digit pairs increased 
only after the entire list was mastered, that 
is, only after there were no more errors to 
any of the items. The authors suggested that 
subjects attend to the entire compound early 
in learning and then attend more selectively 
as learning continues; the selective mechan- 
ism is relaxed only after the entire list has 
been learned. 

Houston (1967) used color-CVC compound 
stimuli, and subjects learned for either 1, 2, 
3, 12, or 20 anticipation trials, They were 
then tested with either the compounds, the 
colors, or the CVCs as cues for recall. Colors 
were more effective cues than CVCs, but there 
was no significant interaction between degree 
of learning and type of recall cue, Although 
the increase was small, there was a tendency 
for recall to the trigrams to increase with 
the degree of first-list learning, 

Lovelace and Blass (1968), in three differ- 
ent experiments, used either CCCs, pronounce- 
able 32%-38% association value CVCs, or 
98%-100% CVC words as stimuli. Learning 
was to a criterion of either 3/6, 6/6, or 6/6 
Correct plus 50% overlearning. Following 
learning, all the letters from the trigrams were 
presented individually as cues for recall. With 
both sets of CVCs, increased degree of learn- 
ing increased the effectiveness of the letter 
cues, but there was no indication of a differ- 
ential effect on letters from the three posi- 
tions. With the CCCs, increased degrees of 
learning did not increase recall to letters from 
the middle and last positions, However, there 


was also very little increase in the recall to 
letters from the first position. 

The evidence available from these three 
Studies suggests that cue selection increases 
with amount of compound-list learning. The 
James and Greeno (1967) study clearly sup- 
ported this position, and the relative cue 
selection of the two components cannot be 
determined in the Houston (1967) study. 
The result with the CCCs in the Lovelace 
and Blass (1968) study also tends to sug- 
gest that cue selection increases with learn- 
ing trials, but the results with the CVCs do 
not. However, it may be reasonable to assume 
that little if any cue selection Occurs with 
pronounceable CVCs or words. 


FORMAL SIMILARITY AND MEANINGFULNESS 
OF STIMULI 


Underwood (1963) pointed out that an 
increase in the formal stimulus similarity 
should produce a greater increase in list 
difficulty with low-meaningful than with 
high-meaningful stimuli. This assumes that 
the subject does attend to all letters as a 
unit with high-meaningful stimulus items and 
does not with low-meaningful stimulus items. 
The experiments concerned with the inter- 
action of stimulus similarity and meaningful- 
ness have produced conflicting results (Goss, 
Nodine, Gregory, Taub, & Kennedy, 1962; 
Lockhart, 1968: McGehee, 1961; Nodine, 
1963). Investigators have only recently be- 
come concerned with the locus of the inter- 
stimulus similarity (Nelson, 1968; Richard- 
son & Chisholm, 1969; Runquist, 1968a). Tt 
Seems that increasing the similarity among 
the selected components of compound stimuli 
should increase list difficulty more than in- 
creasing the similarity among other compo- 
nents. If there is a difference in whether in- 


dividual letters are used as functional stimuli 
for low- 


and high-meaningful stimuli and a 
difference in which of the individual letters 
become functional stimuli, it Seems there 
should be an interaction between all com- 
binations of meaningfulness, similarity, and 
locus of similarity (Nelson, 1968). However, 
this prediction requires further examination. 

Consider the case where the high- and low- 
meaningful stimuli are represented by tri- 
grams, words and CCCs, in paired-associate 


Ni 
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lists with arbitrarily assigned responses which 
are unrelated to any of the trigrams or the 
components. If there is no replication. of 
letters among the trigrams, we would ex- 
pect the word list to be less difficult and to 
show less difference in cue effectiveness among 
the letters in the three positions than the 
CCC list. Although the associative learning 
in the word list should be to the words as 
units, we might expect an increase in formal 
stimulus similarity to increase the difficulty 
of the list, possibly due to increased acoustic 
similarity (Wickelgren, 1965). 

The letters in the CCC stimulus items 
are redundant, but assume that the first letter 
in each trigram is normally the selected cue 
and that there is very little learning to the 
letters in the other two positions. If the 
CCCs are changed so that a single letter is 
repeated in the first position, then the sub- 
ject must either use a different rule for selec- 
tion or permit an exception to the rule, that 
is, the list cannot be learned without using 
something other than first letters as functional 
stimuli. The increased difficulty of the list due 
to similarity in the first position would de- 
pend upon how easily the subject shifts to 
other letters and upon the component-re- 
sponse difficulty of the letters in the first 
position compared to the letters to which 
the subject shifts. Note that repeating letters 
within the first position would not necessarily 
be directly related to the difficulty of learn- 
ing. The use of very few letters, high simi- 
larity, might make it more obvious to the 
subject that the letters in the first. position 
were not distinctive cues and result in selec- 
tion of other letters more rapidly than a 
lower degree of similarity. The same type of 
analysis applies to similarity among letters 
within two of the three positions. Thus, as- 
suming that the subject's cue selection is 
very adaptable and that there is very little 
difference in difficulty among component- 
response combinations, increased similarity 
should not increase the list difficulty as long 
as a set of distinctive letters are available as 
cues. There is also the possibility of increased 
list difficulty with a very low degree of simi- 
larity. Precise predictions cannot be made 
without more information about component- 
response difficulty, the flexibility of cue selec- 


tion, and the consistency with which subjects 
select particular cues. 

There is also an extreme case that should 
increase the difficulty of the CCC list more 
than the word list. If the similarity within the 
three letter positions is such that no in- 
dividual letter is a distinctive cue for a re- 
sponse then the functional stimuli in the 
CCCs must be combinations of letters, and 
in the extreme case the functional stimuli 
must be the sequences of three letters. It is 
at this extreme point that the integration 
of the letters in the word should provide the 
most advantage. 

Although Yum (1931) demonstrated that 
changing the first letter of CVC stimuli at 
recall produced more decrement than chang- 
ing a letter in other positions, there have 
been few studies concerned with the locus 
of formal interstimulus similarity. Runquist 
(1968b) and Runquist and Joinson (1968) 
collected ratings of similarity of pairs of tri- 
grams. The trigram pairs had either 0, 1, or 2 
letters in common, and all possible com- 
binations of positions for the common letters 
were used. An additional condition, spelled 
backwards, had all three letters in common. 
The rating of similarity increased as the num- 
ber of common letters increased. The locus 
of the common letters seemed to have little 
effect as long as the letters were in the same 
position in both trigrams, but there was à 
somewhat higher similarity rating when a 
single common letter was in the first position 
instead of one of the other two positions. The 
effects of the number of common letters and 
locus seemed to be about the same with pairs 
of word CVCs, 90% or higher association- 
value CVCs, less than 25% association-value 
CVCs, and CCCs. The less than 25% as- 
sociation-value CVCs were used in a learning 
study (Runquist, 1968a). Presentation was 
by the anticipation method, and a light ap- 
peared just above one of six buttons to in- 
dicate the correct response. Separate stimulus 
lists were used for each of the seven simi- 
larity rules generated by 0, 1, or 2 letters in 
common and the restriction that the common 
letters had to be in the same position in each 
member of the pair. In general, the difficulty 
of the list increased as the number of common 
letters increased, but there seemed to be no 
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systematic differences due to the locus of 
the similarity. The author stated that an ex- 
planation in terms of stimulus selection was 
not denied by these data, but suggested that 
even low association-value CVCs are coded 
by pronunciation rather than by attending 
to the individual letters. 

Nelson (1968) used 12 CVCs as stimuli 
and two-digit numbers as responses, The 
items were presented by a slide projector and 
learned to a criterion of 12/12 correct by 
the anticipation method following the learn- 
ing of a practice name-verb list. The stimuli 
were pronounced on each test presentation. 
The lists were constructed with CVCs of either 
12.5%, 56.5%, or 97.3% association value, 
and there were either 0, 4, 8, or 12 identical 
letters in either the first or last position of the 
CVCs. Neither similarity nor locus of simi- 
larity had any effect on the number of errors 
to criterion when the stimuli were 12.5% 
association-value CVCs. With CVCs of the 
two higher association values, the number of 
errors to criterion increased as the similarity 
increased, and similarity among the first 
letters tended to have more effect than 
similarity among the last letters. It is worth 
noting that not only was the interaction of 
meaningfulness and similarity opposite in di- 
rection to that sometimes expected from the 
stimulus selection approach, but that there 
was a consistent, though not significant, 
tendency for the difficulty of the low associa- 
tion-value lists to decrease as the number of 
identical letters increased, Although there 
were identical letters in each list other than 
those specified by the variable (number of 
identical letters), in every case either the 
first letters or the last letters could be used 
as functional stimuli, 

In the second experiment of the study, 
Nelson (1968) constructed lists of 15.7% 
and of 99.6% stimulus association-value by 
using two different letters for the first posi- 
tion, three different letters for the vowel 
position, and two for the last position of the 
12 CVCs. The Procedures were the same as in 
the first experiment, but in this case the low 
association-value list was much more difficult 
than the high association-value list. By 
analyzing these results combined with the first 
and last letter-similarity conditions of the 


e 
first experiment, Nelson found an interaction 
of stimulus association-value and locus of 
identical letters. Perhaps of more importance 
is his finding that stimulus similarity had no 
effect on list difficulty with low association- 
value stimuli when either the first or the last 
letters could serve as distinctive cues. When 
it was necessary to use the sequence of letters 
as distinctive stimuli, the difficulty of the 
low association-value list increased drastically. 
In another study (Nelson & Rowe, 1969), 
three-letter words were paired with digit re- 
sponses. The subjects learned a name-verb 
list prior to the experimental list and pro- 
nounced the stimulus words each time they 
were presented for recall. Similarity was 
varied by identical letters in each of the three 
positions and in combinations of two posi- 
tions. With identical letters in only one posi- 
tion, identity in the first position was most 
difficult followed by the last and middle in 
that order. With identical letters in two posi- 
tions, difficulty decreased in the order first 
and last, first and Second, second and last. 
These are roughly the results which would 
be expected from the results of stimulus 
selection studies, but Nelson and Rowe pre- 
ferred an explanation in terms of the amount 
of information in the various positions, Tt 
should be noted that much of the procedure 
should have encouraged the subjects to en- 
code the stimuli as words rather than as 
Sequences of letters; the stimuli were common 
three-letter words, a name-verb list was 
learned prior to the experimental list, and the 
subjects pronounced the Stimulus words each 
time they were presented for recall, 
Richardson and Chisholm (1969) used six 
CCC-digit pairs. The formal similarity was 
Such that distinctive letters occurred only in 
the first, middle, or last position, In each 
case, the other two positions in the six CCCs 


were occupied by only three different letters 
so that any seque 


: ) study which 
distinctive letters in the 


3 of low association-value 
CVCs were equally difficult, 


CUE EFFECTIVENESS IN PAIRED-ASSOCIATE LEARNING 87 


SUMMARY OF RESEARCH 


It is extremely difficult to make a general 
empirical statement about the results of stud- 
ies of cue selection with any degree of confi- 
dence, In each case the evidence is either ex- 
tremely sparse and/or not entirely consistent. 
Perhaps this is to be expected considering the 
variety of materials, instructions, and pro- 
cedures which have been used to investigate 
a phenomenon which is presumably the re- 
sult of an active, adaptable, process. However, 


- before attempting to summarize the evidence, 


it may be useful to discuss some general 
problems. 

First, consider Shepard's (1963) distinction 
among stimuli as to the degree of analyz- 
ability. He considered some stimuli as almost 
inevitably analyzed into components (e.g., an 
arbitrary trigram), and others as almost in- 
variably reacted to as units (e.g., colors), and 
suggested that this was related to meaning- 
fulness as used by Underwood (1963). Pre- 
sumably colors represent one end of this 
continuum and CCCs the other. Words should 
tend to be reacted to as units, and the re- 
action to CVCs should vary considerably, 
probably depending upon pronounceability. 
'The degree of analyzability is to some extent 
a tendency on the part of the subject and 
may change with conditions. Words can be 
analyzed into letters, that is, it seems obvious 
that the subject can react to a letter of a 
word without reacting to the word as a unit. 
A subject given a list of words, a set for 
speed, and asked to read the first letter of 
each word would probably not be able to 
recall very many of the words. Whether CVCs 
are treated as units may vary with conditions, 
for instance, requirements that the subject 
spell or pronounce the CVC as it appears. 
Presumably CCC trigrams can become units 
although the training which would produce 
this has not been specified. Thus the materials 
used as components run the range of analyz- 
ability; colors, words, CVCs, and CCCs. All 
of these except color have also been used as 
compounds with the single letters as com- 
ponents. The degree of analyzability is cer- 
tainly an important problem and basic to the 
concept of cue selection. However, very little 
is known about analyzability, and this may 


result in some confusion in the stimulus selec- 
tion studies. It is plausible that the letters 
from a word trigram produce different 
amounts of recall following learning, not be- 
cause of cue selection, but because the letters 
are differentially effective as redintegrative 
cues for the words (Horowitz & Prytulak, 
1969); in all cases the associative connec- 
tion may be between the word and the re- 
sponse. The same may often be true of CVCs 
(Postman & Greenbloom, 1967). Thus studies 
of cue effectiveness or cue selection may 
sometimes be comparing two different phe- 
nomenon. 

Second, consider the emphasis on type of 
material in many studies of stimulus selection. 
Of course it is necessary to have normative 
information about selection of different types 
of material, and hypotheses about the under- 
lying processes may be related to the type of 
material used as compounds. However, in 
many cases the major concern seems to have 
been comparing cue effectiveness of two types 
of components without any apparent orienta- 
tion other than the specific empirical ques- 
tion. 

Third, consider the variety of designs and 
criteria used to demonstrate stimulus selec- 
tion. Often the only conclusion possible is that 
one component is a more effective cue than 
another or that a component is a less effective 
cue than the compound. Some studies demon- 
strate cue selection, but the design is such 
that only relative statements can be made; 
one component is selected as the functional 
stimulus. The lack of a common quantitive 
measurement of cue selection makes compari- 
sons among different experiments difficult. 

Now, consider the empirical results of the 
studies of stimulus selection. Paired-associate 
learning with compound stimuli consisting of 
redundant relevant components is usually 
more difficult than learning with one of the 
components as stimuli. In two cases (Hill & 
Wickens, 1962; Saltz, 1963) the compound 
list was less difficult than either component 
list, and in both cases color was one of the 
components. After learning a compound list, 
the compound is usually a more effective cue 
for recall than any single type of component. 
With one list, the color-CCC compound has 
consistently been found to not be more effec- 
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tive than the color component (Houston, 
1967; Jenkins & Bailey, 1964; Underwood 
et al, 1962), and in another list the word 
component was as effective as the word-color 
compound (Sundland & Wickens, 1962). It 
has been suggested that these comparisons of 
compound- and component-list difficulty and 
of compound and component cue effectiveness 
deserve more research to carefully analyze 
what is learned. The fact that on one hand 
the compound list is more difficult to learn, 
and on the other that the compound is the 
more effective cue, Suggests another approach. 
It might be profitable to attempt to equate 
the two by maximizing the cue selection ofa 
component from the compound. To the ex- 
tent that this is possible, the differences be- 
tween compound and component difficulty 
and between compound and component cue 
effectiveness should decrease. This type of re- 
search would also help to establish the limits 
of cue selection, 

When components of a compound differ 
in difficulty it seems clear that subjects tend 
to select the less difficult component and the 
amount of selection increases as the difference 
in difficulty of the components increases 
(Sundland & Wickens, 1962; Underwood et 
al, 1962; Young, Farrow, Seitz, & Hays, 
1966). The tendency for positive transfer 
training with one component to reduce the 
selection of the other component (Houston, 
1967) also Supports this position, 

There is also Selection. even though the 
components are equal in difficulty. The com- 
ponent may be selected on the basis of type, 
as words from a word-color compound (Un- 
derwood et al., 1962), or on the basis of posi- 
tion in the compound 
Brave, 1964: Tenkins, 


of cue selection (Harrington, 1969). Although 
these variables have been 
ponents of equal difficulty 
gests that they mày interact with level of 
difficulty. In Harrington's (1969) study, 
emphasizing a word component with a colored 


used with com- 
the evidence sug- 
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background increased the efficiency of selec- 
tion from word-CVC compounds from .151 to 
-051 and that from word-word components 
from .240 to .154. When the color emphasis 
was placed on the CVCs in the word-CVC 
compounds the efficiency decreased from 51 
to .177. These results suggest that adding color 
emphasis to a component not only changes 
the efficiency of selection but that the direc- 
tion of the change depends upon whether the 
emphasis is upon the less difficult or the 
more difficult component, . v 

Although there is very little available evi- 
dence (Houston, 1967), instructions are 
probably a variable in cue selection. 

The experiments concerned with stimulus 
similarity, meaningfulness, and cue selection 
demonstrate the complexities of attempting to 
determine the effect of two variables on cue 
selection before there is much evidence on the 
effect of either. It seems that the locus of simi- 
larity can have an effect (Richardson & 


Chisholm, 1969) or not (Nelson, 1968) on 
paired-associate 


In general, the data seem Consistent with 
Subjects attend to the 


to the other (Harrington, 1969: James & 
Greeno, 1967), Perhaps the evidence will be 
more convincing when more consideration is 
given to individual 
tion. The fact that selection occurs even 
with group measurements based on component 
Classification. with of approxi- 
mately equal difficulty Suggests that the 
learning is “intrinsically selective” (James 
& Greeno, 1967), at least with adult Subjects 
Apparently only one experiment has us 1 
children as subjects (Baumeister & B = 
1968), and there was th e 
ponents following = Xu. 


ttle information avail- 
cal Orientation may be 
€ research. The simplest as. 


able, a rough theoreti 
useful for futur 


umption seems to be that cue selection is 
he result of focused attention. Other sources 
should be consulted for discussions of at- 
tention (cf. Egeth, 1967; Peterson, 1969; 
Trabasso & Bower, 1968; Treisman, 1969), 
but focused attention is considered a type of 
time-sharing device. Any time spent process- 
ing one component-response pair cannot be 
spent on another. The major determinant of 
the focus of attention in paired-associate 
learning is the relative difficulty of the com- 
ponent-response pairs which permit successful 
performance of the task, Arbuckle and Cuddy 
(1969) have presented evidence that sub- 
jects can make the necessary discrimination. 
The subjects not only select a component 
from a compound but select the least difficult 
component. The studies of stimulus selection 
have concentrated on determining the func- 
tional stimulus, but it should be noted that 
the assumption about difficulty requires con- 
sideration of the response terms. Difficulty 
refers to the difficulty of learning so that the 
response term can be recalled when the stim- 
ulus is presented alone. Thus, any prior train- 
ing which changes the component-response 
difficulty should have an effect on cue selec- 
tion, This includes the assignment of re- 
sponses which are associations of one set of 
components (Solso, 1968a, 1968b). The diffi- 
culty hypothesis also implies that subjects 
may select a component from a compound re- 
sponse term if this is permitted by the per- 
formance requirements of the task. For in- 
stance, the use of a multiple-choice test in 
paired-associate learning (Schulz, Weaver, & 
Ginsberg, 1965) may result in the selection of 
a component from a response term compound. 

The locus of the selection has received little 
attention in studies of stimulus selection. 
Jenkins and Bailey (1964) used trigram- 
color compounds and found that requiring 
subjects to spell the trigram aloud during 
learning had no significant effect on transfer 
to the components. Jenkins (1963) required 
subjects to spell CCC compounds aloud dur- 
ing learning and found more recall to the first 
and final letters than to the medial letters. 
However, Lovelace and Blass (1968) failed 
to find any significant differences in recall 
to letters from CCC compounds when sub- 
jects articulated the compound during learn- 
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ing. It should be noted that requiring the 
subject to spell a compound stimulus aloud 
during learning may result in stimulus in- 
tegration or in the subject interpreting the 
instructions as meaning he should learn the 
response to all of the components. A lack of 
differential recall after the subject is required 
ito verbalize the components is not strong 
evidence that subjects do not pay attention 
to, or process, nonselected components. 

The stimulus selection studies have used 
components which were presented visually 
and were spatially separated. In this case, cue 
selection may be by peripheral control of the 
stimulus input. Although the efficiency of 
selection may change, there is no reason to 
assume that selective learning could not occur 
if other types of selection (Treisman, 1969) 
were required. 

Tt has been assumed that difficulty of learn- 
ing is the primary determinant of the selec- 
tion of pairs to be learned and thus of the 
functional stimulus. However, the selection is 
assumed to occur through the focus of atten- 
tion, and there are many other things which 
should determine the focus of attention. Prior 
use of a class of components as stimuli, em- 
phasis by color, and instructions are ex- 
amples of variables which seem to focus at- 
tention on certain components. Groups of 
subjects rather consistently select certain com- 
ponents as stimuli, and the evidence for con- 
sistency and efficiency may increase as in- 
dividual selection is systematically examined. 
There are two striking things about this 
selection. First, the selection of components 
can be described by a simple rule, and second, 
the rule specifies classification into selected 
and nonselected components on the basis of 
a characteristic which does not require dis- 
crimination among selected components. For 
instance, words in word-word compounds can 
easily be classified according to position or to 
color emphasis without the subject processing 
the words so they are distinguishable as 
words. Of course the use of simple rules for 
classification may depend upon the type of 
material, but further investigation should give 
some information about whether the stimulus 
processing necessary ior classification results 
in associative learning. Variation of type of 
compounds and stage of compound-list learn- 
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ing should also give information about the 
level of selection, or type of processing, which 
results in associative learning of the non- 
selected component-response pairs, 

Future research using the techniques of the 
studies of stimulus selection in paired-associ- 
ate learning need not be primarily concerned 
with determining what type of component is 
selected as the functional stimulus. The major 
concern may be with what type of processing 
interferes with, or is necessary for, associative 
learning or with the comparison of the proc- 
esses of abstraction with those in concept- 
identification tasks, 
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appropriately. 


Heritability is best understood as a secondary 
criterion in the construction of a psychological 
test. Let the materials for the test be selected 
by other criteria—content, and predictive or 
concurrent relations with external measures. 
It remains to put these materials together in 
final form. In personnel selection and other 
areas where predictive validity applies, there 
exists an empirical criterion competent to 
assemble as well as select test materials. But 
in the testing of abilities, personality, atti- 
tudes, and interests, in areas where construct 
validity normally applies, no such empirical 
criterion exists. Content and external rela- 
tions do not amount to & hard-and-fast 
criterion. With rare exceptions, they permit 
the selection of materials but not definitive 

| assembly. 

The great majority of psychological tests 
are finally assembled on internal consistency 
as a secondary criterion. After the primary 
criteria have been imposed, test materials are 
dropped or differentially weighted on the basis 
of their relations to the entire pool of eligible 
materials defined by the primary criteria, 
Heritability is not competitive with content 
or relations with external measures, but it does 
compete with any secondary criterion like 
internal consistency which might otherwise 
Serve as a basis for the final assembly of the 
test. The two Standards are not mutually 
exclusive, at least not rigidly. Tt is possible to 
insist that all materials meet minimal require- 
ments of internal consistency and then impose 

* Requests for reprints should be sent to Marshall 
B. Jones, Department of Behavioral Science, Pennsyl- 
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heritability as a criterion in final assembly. 
The reverse is also possible, that is, to insist 
that all materials meet minimal requirements 
of heritability and then impose internal con- 
Sistency as the final criterion. Which way a 
test constructor went would depend on whether 
he thought homogeneity or heritability was 
the more critical desideratum in the test he 
was building. 

Most psychological tests claim to be mea- 
suring something basic. The traits they are 


designed to measure are thought to be funda- 


mental, something deeply rooted in a person's 
nature. But if they are, what makes them basic? 
Granted that a test of introversion is not just 
basic—it should also have Something to do 
with introversion—what guarantees that it 
is basic as well? In building a test that is not 
only supposed to get at something but to get 
at it in fundamental form, shouldn’t the con- 
struction process include Some surety that it 
does? And if it should, what better criterion 
could we adopt than that it be maximally 
heritable? Once we have selected materials 
which satisfy relevant criteria as to content 
and external relations, why not go further 


and assemble the test in a way which makes it 
as basic, as heritable as possible? 
A few attempts have been made in test 
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criteria, and use the results to assemble the 
test in final form. It is an extraordinary fact 
that no psychological test has ever been 
constructed in this fashion. For all the emphasis 
that many psychologists have placed on 
heritability in various kinds of tests, no 
psychologist has ever used twins in construct- 
ing a test with heritability as a criterion. 
'The purpose of this paper is to describe and 
illustrate a twin method for maximizing herit- 
ability in the construction of a psychological 
test. The method itself was developed by 
Vandenberg (1965) and, in more detail, by 
Bock and Vandenberg (1968), though they 
used it for another purpose. The illustrations 
are from Vandenberg's recent and extensive 
studies of psychological functioning in twins. 


METHOD 


'The use of monozygotic and dizygotic twins 
to maximize heritability is not ideal. Any 
nongenetic effect that associates differentially 
with the two kinds of twins either inflates or 
deflates the genetic contribution, according 
as it promotes or inhibits similarity in iden- 
tical relative to fraternal twins. Nevertheless, 
twin differences are a very good approach to 
heritability, even by absolute standards, and 
certainly much the best we have at the human 
level. We begin, therefore, with Nm pairs of 
monozygotic and Na pairs of dizygotic twins. 
The data are scores for all individuals in both 
series on each of p test variables. We use the 
notation (xy); x = 1 or 2, y =m or d, to 
indicate the score on the jth test variable of 
one or the other twin, 1 or 2, in the ith mono- 
zygotic (m) or dizygotic pair (d). The dif- 
ference scores (—X)u = (1) — (2y)g con- 
stitute the working base for the analysis. 

The heritability of a single variate can be 
determined by comparing the variance esti- 
mates within twinships in the monozygotic 
and dizygotic pairs. The variance estimate 
(MS) within monozygotic (WM) pairs for 
the jth test variable 


Nm 
2s (=m): 
MSWM; = a a 
m 


is a function of environmental or nongenetic 
effects only, while the variance estimate within 


dizygotic (IID) pairs for the jth test variable 


Na 
P» Ug 
z i= 
on, iE. 
MSWD; 2N; 
is a function of both genetic and environmental 
effects. If we assume that the environmental 
contributions are the same in the two series of 
twins, we get the heritability coefficient 


MSWD; — MSWMj 
MSWD; 


or the ratio of genetic to total variance in the 
dizygotic differences. Tt does not matter for 
purposes of maximization, but it is worth 
noting that this coefficient estimates herit- 
ability in the dizygotic twin differences and, 
hence, underestimates the heritability between 
dizygotic pairs or in the general population. 

We wish to construct a linear composite of 
all p variables 


(xy = ay(xy)a + as(xy) o ay (xy)i»» 


that maximizes I or, which is the same thing, 
that maximizes MSWD subject to the restraint 
that MSWM equals unity. Let V be the 
p X p matrix with elements 


u- 


i=l 
vu = 


Na 

Y (=d)i(- Dix 
= o. 
2Na 


and W the p X p matrix with elements 


Nm 
X (-mu-ma 
ost 
Hik 2Nm 
It follows that 
Na 
» (-d? 
3 i 
MSWD-—3w; ~ aVa', [1] 


and 


Nm 
p" (—m)? 
usw = = 


JN. 7 aWa, [2] 


where a is the row vector (aj). 
To maximize MSW D while holding MSWM 


equal to unity, we differentiate 
MSWD +11 — MSWM), 


uw 


where / is the Lagrange multiplier and set the 
result equal to zero, Doing so, we get 
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V-W =0 [3] 
where ] must be chosen so that 
|y = IV| 2 o. [1] 


Premultiplying by W- we get 


[Way — H|-0. 
Hence, the solutions f. 


Toots of the matrix 
given by 


or Lare the characteristic 
WY. Their meaning is 


1 — aya! = MSWD, 
which follows from Equations 1, 2, 3, and the 
requirement that MSWM equal unity; the 
Toots are real and positive, Hence, if we take 
the largest root lı and calculate 


h—1 
m 
1 
We obtain the maximal herit. 
To obtain the Tow vector 
the maxima] composite, we r 


3, in which lı as well 
and W 


h = 


ability, 


trices 
» and solve. The 
are unique up to multi- 
factor. W üy then fix 
dition that 


; Or, since multiplication 


the illustrations 
largest a; is Set equal to 
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ILLUSTRATIONS 


As part of a large-scale investigation into 
hereditary components in normal behavior, 
Loehlin and Vandenberg (1968) collected 123 
Monozygotic and 75 like-sexed dizygotic twin 
Pairs, drawn from the Public schools in four 
Michigan cities and in Louisville, Kentucky, 
and administered to them three subscales for 
each of five factors in Thurstone’s Primary 
Mental Abilities (PMA) test: Numerical, 
Verbal, Spatial, Word Fluency, and Reasoning 
Ability. The results for Verbal and Reasoning 
Ability and their subscales are used here. 
Table 1 reproduces the mean results from 
Lochlin and Vandenberg’s original report. 
None of the differences betwee 
and dizygotic twin pairs are 
Michigan sample score 
than the Louisville san 
subscale; the 
ficantly higher 
girls did signifi 


higher 
on the Sentences 
sample scored Signi- 
on Letter Groupings; and the 
cantly better than the boys on 

bscales, the usual 
in the PMA, Since 
ot appreciably to 
ces, on which the 
ds, we follow the 
ng the results by 


heritability 
original inve 


ntains the product ma 


trices IV ang 
Verbal and Re 


‘@soning subscales. 
Within-pair data Contain all that is 
necessary for the heritability analyses; that 
is, we need no inform 


V for the 
These 


l ation other than the 
unity. matrices TV and V for the verbal subscales to 
TABLE 1 
Means ror VERBAL AND REASONING AND THEIR SUBSCALES BY ZyGosrry, Sapte, AND SEX 

Scale Mz Dz Michigan Louisville Male Female 
1. Sentences 17.9 18.6 19.4 17.3 
X ^ ^ 18.2 
2. Vocabulary 27.8 28.2 28.7 27.5 27.8 A 
3. Completion 27.5 273 28.2 26.9 28.4 xi 
erbal 73.3 74.1 76.1 71.9 74.5 7 s 
4. Letter Series 15.5 15.5 15.3 5 7 
5. Letter "i - Ma 
Groupings 14.5 14.3 13.7 14.9 13.7 
6. Pedigrees 234 24.5 24.6 232 217 s 
Reasoning 534 54.6 53.7 54.0 505 us 
i “ 56.9 
Note.—Reprinted from an article by J, Va T in pl sm = "P 
AME erit: Copyright 7 Jehos Hopkins University Press, 1963, "PETE in Progress in y "AN Behavior Generis by Steven G, 
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TABLE 2 


Propucr Matrices W AND V rog THE VERBAL AND REASONING SUBSCALES OF THE 
PRIMARY MENTAL ABILITIES (PMA) TEsts 


Monozygotic (I) Dizygotic (V) 
Subscale 
1 2 3 1 | 2 | E 

Verbal 

1. Sentences 19.66 13.50 4.93 37.90 | 26.63 23.28 

2. Vocabulary 13.50 46.44 17.10 26.63 62.97 32.61 

3. Completion 4.93 17.10 26.12 23.28 32.61 59.09 
Reasoning 

1. Letter Series 27.64 3.05 5.80 29.31 10.27 | 125 

2. Letter Groupings 3.05 14.86 4.83 10.27 24.52 5.40 

3. Pedigrees 5.80 4.83 56.39 12.55 5.40 65.46 


3 Note.—Reprinted from an article by J. C. Lochlin and S. G- Vandenberg in Progress in Human Behavior Genetics by Steven G. 
Vandenberg. Copyrighted by Johns Hopkins University Press, 1965. 


find out how Sentences, Vocabulary, and 
Completion should be weighted so as to maxi- 
mize the heritability of the composite factor 
score, and similarly for the subscales of 
Reasoning. 

Once the produce matrices are available we 
write Equation 4 in numerical form and solve 
for l; in our two examples, the equation is 
cubic. The roots are 2.77, 1.37, and 1.19 for 
Verbal and 1.75, 1.22, and 0.80 for Reasoning. 
Since these roots represent maximal values of 
MSW D, we take the largest root for each factor 
to the exclusion of the others. The largest root 
for Verbal gives a heritability of .64, and the 
largest root for Reasoning a heritability of .43. 

Once l has been determined, we return to 
Equation 3 and solve for the maximal weights. 
These weights appear in Table 3 together with 
the univariate heritabilities for the subscales. 
Plainly, both the univariate heritabilities and 
the maximal weights vary greatly from sub- 
scale to subscale. The Vocabulary subscale, 
for example, has a low heritability and con- 
tributes little to the heritability of the Verbal 
composite. If Vocabulary is dropped, 2 com- 
posite formed from the remaining two sub- 
scales has a heritability of .61. In the Reasoning 
set, Letter Groupings gives to the composite 
almost all of its modest heritability. Letter 
Series and Pedigrees have very low heritabili- 
ties, and their inclusion in the composite 
increases its heritability by only 4 points, 
from .39 to .43. 


The standard procedure for obtaining à 
total score from the subscales of the PMA is 
to add the subscale scores. The use of herit- 
ability as a criterion would militate against 
this procedure because it lowers the overall 
heritabilities from .64 and .43 to .50 and .28; 
in both cases the standard procedure results 
in a composite which is less heritable than one 
of its components. Heritability as a criterion 
would mean differential weighting of the sub- 
scales. In the absence of offsetting considera- 
tions, it would probably mean deletion of 
Vocabulary from the Verbal composite and 
Letter Series and Pedigrees from the Reasoning 
composite. i 


TABLE 3 


UNIVARIATE HERITABILITIES AND MaxiMAL WEIGHTS 
FOR THE VERBAL AND REASONING SUBSCALES 
OF THE PRIMARY MENTAL ABILITIES 
(PMA) Tests 


Univariate Maximal 
Subscale heritability weight 
Verbal 

1. Sentences AS 82 

2. Vocabulary .26 —.36 

3. Completion .56 1.00 
Reasoning 

1. Letter Series .06 25 
2. Letter 

Groupings 39 1.00 

3. Pedigrees A4 —.07 


o6 NN 


Heritability is not always an appropriate 
criterion in the construction of a psychological 
test. Political and social attitudes, for example, 
enjoy important relations with other person- 
ality variables (Adorno, Frenkel-Brunswick, 
Levinson, & Sanford, 1950) and show sub- 
stantial stability over long periods of time 
(Kelly, 1955). They are also strongly familial, 
witness the powerful association between par- 
ents and children in party of registration— 
Republican or Democrat, Nevertheless, no 
one supposes that specific political and social 
attitudes are hereditary in any sense which 
would recommend an approach through twins. 
Wherever a trait is known to depend entirely 
or almost entirely on environmental shaping, 
building a test to maximize heritability is 
inappropriate. 

At the same time, there are other test varia- 
tions in which heritability is particularly 
apposite—suggestibility, for example. Tests of 
suggestibility are heritable (Eysenck & Prell, 
1951) and transsituational to an exceptional 
degree (Goodson & Jones, 1962). The Hull 
body-sway test is the best known, but there 
are many tests other than Hull’s which also 
qualify as good measures of Suggestibility. 
These other tests, for example, arm gravita- 
tion or levitation, eyelid or arm catalepsy, 
are generally included in a battery of tests 
designed to assess overall Susceptibility to 
primary, direct suggestion. In the absence of 
evidence which excludes it as a meaning for 
“basic,” heredity is always a candidate. When, 
as in the case of suggestibility, existing evidence 
confirms the role of heredity and the test varia- 
tion is transsituational into the bargain, then 
using heritability as a criterion in the assembly 
of a test battery is definitely indicated. 

Schedules of neurological and motor develop- 
ment in infancy and early childhood are another 
area in which heritability applies with par- 
ticular point, Individual differences in early 
rates of development are consistent (Thomas, 
Chess, Birch, Hertzig, & Korn, 1964) and 
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heritable (Freedman & Keller, 1963). In the 
first two years of life, at any rate, develop- 
ment is an essentially biological process; more 
importantly, our interest tends to center in 
those aspects of early growth and maturation 
which are under especially. strong genetic 
control. Much of the point in assessing early 
development is to track differences which 
adumbrate on a genelic basis serious trouble 
for the child at issue, Here again, therefore, 
it would make Sense to assemble a schedule 
using heritability as the criterion. 

The range of test situations in which herit- 
ability might play a fruitful role, while inter- 
esting, is not a critical question. What matters 
is the recognition that in some situations 
heritability as a secondary criterion in the 
construction of a Psychological test both 
applies and can be applied. 
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Petrinovitch and Hardyck's recommendations (1969) are criticized. It is shown that 
for two mean contrasts, the techniques they studied are applications of the same 
statistic; only different critical values for significance and /or the use of sequential 


procedures distinguish the tests. Their r 
that a conservative criterion that protec 


esults mainly reflect the usual relationship 
is against a Type I error has a larger risk 


of Type II error than a less conservative criterion. Petrinovitch and Hardyck's 
recommendation of the usage of only the Schefié and Tukey tests is only an arbitrary 
preference for conservative tests. Their conclusions on the effects of unequal vari- 
ances and unequal ws are disputed; an alternative technique for the unequal 7 case 


is proposed. 


Petrinovitch and Hardyck (1969) made a 
worthwhile contribution on the robustness of 
the statistic 

T Xi = X; 
1 = MS ‘B/N 
where M.Sg is the mean square error for the 
analysis under consideration, and 7 is the num- 
ber of cases on which each mean is based 
(n; = n; = n). Unfortunately, they do not 
seem to recognize that all of the tests they 
studied are based upon this one statistic, some- 
times in disguised form. Their recommenda- 
tions are written as if they were contrasting 
entirely different statistics in the same tradi- 
tion as Boneau's (1962) comparison of the 
robustness and power of the / and Mann- 
Whitney U. 

When the problem is limited to comparisons 
of pairs of two means, the Scheffé method, the 
two Tukey methods, the Duncan multiple 
range (1955), and the Newman-Keuls (N-K) 
test can all be converted to a common statistic. 
The Scheffé test compares the absolute value 
of the paired mean contrast, |X; — X;|, to the 
critical value 


S 8 4| (k— "Tb e 


m itm 
(Kirk, 1968, p. 91). For convenience, assume à 


1 Requests for reprints should be sent to Paul A. 
Games, Department of Educational Psychology, Penn- 
sylvania State University, University Park, Pennsyl- 
vania 16802. 


k treatment independent groups design so 
MSx MSsuincns and Fa has k —1 and 
N — k degrees of freedom (N =2n,). For paired 
means, a general contrast significant by the 
Scheflé criterion may be rewritten as 


X: nd X j 

ES Rl A — DF. 
Jus 12) 
ni Ny 


For the equal 7 case, fo reduces to 
X;-X; 
V2MS E/ n 


and we may convert to the qo statistic by mul- 
tiplying both sides of the above inequality by 
v2 obtaining 


\qo| > N2(k — 1)P aai st 


The Tukey Wholly Significant Difference 
(WSD) test (Miller, 1966, p. 92) is the applica- 
tion of the same statistic (qo) with a critical 
value (qa) obtained from the kth column and 
(N — k)th row of the studentized range table 
(Kirk, 1968, p. 531). The Scheffé and WSD 
critical values are used on all pairs of means. 
In contrast, the Newman-Keuls (N-K) test 
(Keuls, 1952; Newman, 1939) decreases ga as 
the ranked range of the means (h) decreases. 
When X; and X; are the largest and smallest 
means from a set of / means, then go is a 
studentized range statistic (q). When hk = k, 
the critical value of the N-K test is identical 
to that of the WSD test. If this initial com- 
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parison is significant, then one proceeds to 
test "lower rank differences” using q and Ja 
for the two sets of ranked means where 
h =k — 1. The Tukey B test (Winer, 1962, 
p. 87) compares g with the mean of the WSD 
and N-K critical values. 

The Type I experimentwise error rate is the 
probability of one or more rejections of a true 
null hypothesis of equal means, in a given 
experiment. The Type I error rate per com- 
parison is the probability that any particular 
one of the comparisons will be significant when 
the null is true (Ryan, 1962). When the overall 
null hypothesis is true, the WSD, Tukey B, 
and N-K tests automatically have the same 
experimentwise rate for Type I errors, since 
all apply the same critical value to the largest 
difference. Petrinovitch and Hardyck’s findings 
of an identical experimentwise T ype I error 
rate of .054 mainly confirms the accuracy of 
their computer programming. The error rate 
per comparison may differ, since once a Type I 
error has been made, the Tukey B has a lower 
critical value for the next ranked difference, 
and the N-K has a lower one still, 

The q statistic, the studentized range dis- 
tribution, and sequential testing identical to 
that of the N-K test are used in the Duncan 
multiple range test but with more liberal criti- 
cal values. The “protection level” against 
Type I errors using Duncan’s test is (1 — ay; 
for a k treatment experiment, the experiment- 
wise risk of a Type I error rate is 1 — (1 — a)*1, 
Using a = .05, the theoretical experimentwise 
rate of Type I error for k = 3 is .098, fork = 6 
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is .226, and for k = 10 is .370. Petrinovitch and 
Hardyck obtained empirical estimates of .098, 
-22 (estimated from Figure 4, p. 47) and :353, 
respectively. 

An experimenter uses the most liberal cri- 
terion of all if he conducts ¢ tests on all pairs 
of means. Using MSp as the estimate of com- 
mon variance, go = £2, so a significant out- 
come is obtained when go> £2 = Gam 
h = 2 (tg excludes a of the / distribution in thi 
two tails). 

For the equal 1 case on paired means, all of 
the above methods thus differ on the critical 
point specified and whether a sequential pro- 
cedure on ranked means is used. They are listed 
in Table 1 from most conservative (lowest risk 
of Type I error) at top to most liberal at the 
bottom. This ranking is automatically an in- 
verse ranking in protection against Type IT 
errors. The Scheffé test’s protection against 
Type I errors must be paid for by its lack of 
power when there are moderate ‘differences 
between the means. 

It should be noted that the several criteria 
applied to the qo statistic for these different 
tests only vary slightly with dfe(N — k), but 
they vary greatly with increases in k. It is not 
surprising that Petrinovitch and Hardyck 
found the tests at the top are only slightly less 
powerful than the bottom tests when working 
with large differences between the means and 
the k = 3 case. With k = 3, the critical values 
of the tests are still reasonably close together, 
and at upper points on the power curve, only 
moderate differences in the power of the tests 


TABLE 1 
NoMiNAL .05 CRITICAL VALUES (CV) or q FOR VARIOUS COMBINATIONS OF k AND n 
k=3 k=6 k=10 
Method 
dfe =12 | dfe = 120 fg = = = 
La ssi 120 dfa = 24 tie = 120 458 = d die > 120 
Scheffé 3.94 3.50 5.12 4.78 6.17 5 
k ranks 3.77 3.36 4.37 4.10 474 ee 
‘Tukey B ` 456 
for (k—1 rank) 3.43 3.08 4.27 4.01 
Newman-Keuls à "am 4.51 - 
for (k—1 rank) 3.08 2.80 4.17 9 
Duncan 3.93 4.64 4.47 
(k rank) 3.23 2.95 3.28 3:17 5 
v 3.08 2.80 2.92 2:80 aa 3.31 
E = E 2.86 2.80 
^e rank critical value is identical for the Tukey WSD and the m 


IO d divin = 
N-K test, and hence also for the Tuke Y B test, 
ey B test. 
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TABLE 2 


THEORETICAL Type I ERROR Rates FOR THE TABLE 1 FIXED CRITICAL VALUES 


k=3 | k=6 k =10 
Method N-k=12 N —k = 120 N-k-24 N —k- 120 N-k-40 N —k = 120 
RC | REW | RC | REW | gc | rew | Rc | REW | RC | REV RC | REW 
Scheffé 016 O41 015, .039 .001 015 .001 .012 |.00009| .003 .00005 | .002 
WSD .021 ‘050 | .018 | .050 | .005 .050 | .004 | .050 |.002 .050 |.002 .050 
wa ‘050 | .116 | .050 | .121 ‘050 | .338 | .050 | .359 | .050 .890 | .050 .616 


Note.—RC - Rate per comparison ; REW = Rate experimentwise. 


are possible. The smallest difference between 
the means (.6 standard deviations) that Pet- 
rinovitch and Hardyck used is close to the .5 
standard deviation point that Cohen (1962) 
described as representing a medium difference. 
Cohen found that only 25% of the articles in 
one volume of the Journal of Abnormal and 
Social Psychology had a power of .60 or better 
of detecting this moderate discrepancy. The 
larger discrepancies that Petrinovitch and 
Hardyck used are close to the point where sta- 
tistics are unnecessary. Even the “eyeball” 
test should distinguish between groups where 
the means are 1.6 standard deviations apart 
and 7 is 30. Naturally, all tests show a high 
power at such discrepancies, so the power dif- 
ferences between tests are minimized. If 
Petrinovitch and Hardyck would use the 
k = 10 case, with moderate discrepancies, and 
ns typical of psychological experimentation, 
the lack of power of the Scheffé test and WSD 
test under these conditions would become ap- 
parent. It is under the large & case that the 
multiple comparisons techniques are primarily 
needed, and where they differ most from each 
other. 

It is possible to obtain a theoretical assess- 
ment for the Type I error rate experimentwise 
for all the tests used, and the Type I error rate 
per comparison for those tests using à fixed 
criterion at all stages. This may be done by 
using the Harter, Clemm, and Guthrie (1959) 
tables of the studentized range. Using the 
critical value (CV) for a given test (as from 
Table 1) and the % = k column, the Harter 
et al. tables yield the portion of the curve below 
this point. Subtraction from 1.0 yields the 
experimentwise Type I error rate. The Type I 
error rate per comparison may be found using 


the values in the /; = 2 column in the same 
tables, or it may be found by dividing CV by 
the square root of 2, and finding the corre- 
sponding two-tailed area of the ! distribution. 
(Pearson & Hartley, 1966, pp. 138-140). The 
rates of Type I error corresponding to the 
critical values in Table 1 are shown in Table 
2 for the tests with fixed criteria. Table 2 
emphasizes the relation between the ¢ and 
WSD tests. The ! sets the error rate per com- 
parison at the nominal alpha. (.05 here), and 
lets the experimentwise rate rise with k. The 
WSD sets the error rate experimentwise, and 
lets the error rate per comparison decrease as k 
increases. 

"Theoretical values for the Type I error rates 
experimentwise and per comparison may be 
computed for the cases in Petrinovitch and 
Hardyck's Figures 1, 2, 4, and 6. The values 
in the figures are consistent with theory except 
for the values for the / test (Petrinovitch and 
Hardyck's t) when » = 5, and k = 3 or 6. 
These values differ from the theoretical ex- 
perimentwise rate (p < .01), and are also con- 
sistently low in the per comparison rate. This 
result should be replicated before it is attri- 
buted to other than error. 

Since the same statistic may be used in all 
of the above tests, it should be clear that there 
is no universal “best choice" among them. Any 
reduction in risk of Type I error is paid for by 
an increase in the risk of a Type II error. 
(As usual, the magnitude of the increase will 
depend on the discrepancy from the null, and 
can be quantified only for specific situations.) 
Petrinovitch and Hardyck concluded by reject- 
ing all techniques more liberal than the Tukey 
B as having too large a Type I error rate, and 
recommended: “If differences are not found to 
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be significant by the Scheffé, the method of 
choice would seem to be Tukey B which fixes 
experimentwise error rates at conventional 
levels and which shows little deviation as a 
result of the violation of assumptions [5:53]. 
From the few comparisons available in their 
Table 3 (Row a with Row f, and Row b with 
Row g, p. 51), it seems that most of the tests 
have about the same probabilities When using 
mixed exponential populations and normal 
populations as they had under normal popula- 
tions alone. However, there is little to suggest 
that the Tukey B is better in this respect than 
the other tests. These data on the robustness 
of the qo statistic are welcome, but extremely 
limited, and certainly not comparable to the 
extensive data on the robustness of the F sta- 
tistic in the independent groups analysis of 
variance (Box, 1954; Games & Lucas, 1966; 
Srivastava, 1959) or its two-group transform, 
the ¢ test (Boneau, 1960; Bradley, 1964). 

A major question will be whether differ- 
ences between extremes in a larger set of means 
will be more sensitive to form and heterogeneity 
of variance violations than is the two means 
case. This is essentially a matter of whether 
the larger critical values of the more conserva- 
tive tests will still be accurate. We can expect 
systematic relationships between violations on 
the several tests. For example, if the sampling 
distribution of q for h = 6 under some viola. 
tion should prove to have a larger right-hand 
tail than the studentized range distribution, 
this would increase the Type I error rate for all 
tests, in various degrees. It would seem more 
profitable to plot the sampling distribution 
Separately for the various ranks (ls) so that 
experimentwise and per comparison rates of 
error could be estimated for any fixed points. 

Petrinovitch and Hardyck continued: “The 
Tukey B is also somewhat more sensitive to 
real differences between groups [p. 53].” Of 
course, and the N.K, test is more sensitive to 
real differences (more powerful) than the 
Tukey B, and the Duncan test is more powerful 
than the N-K, and the multiple / test is more 
powerful than Duncan’s, Their recommenda- 
tion is one arbitrary point on a continuum of 
choice. In addition, when one specifies a con- 
Servative test, and then says that if this con- 
Servative test is not significant, he will use a 
more liberal test, he is merely adding ambiguity 


PAUL A. GAMES 


and inconsistency to his decision rule. He 
should instead admit to using the lower critical 
value in the first place. This is analogous to 
running a test at the .01 level, and if not 
significant rerunning at the .05 level. The re- 
porting of a minimum alpha or extreme area 
(Blommers & Lindquist, 1960) is acceptable, 
but one is misusing terms if he adopts the 


:01 level of significance. 
Rather than specify 


above procedure and then says he is using a 


a single fixed point on. 


the continuum as “best,” it is more appropriate ~ 


to consider the consequences of Type I and 
Type II errors in various situations and to 
Suggest whether a conservative or liberal criti- 
cal value would be most appropriate. For situa- 
tions in which collecting data is quite expen- 
sive, and one or more clearly specified a priori 
hypotheses are available, the author would 
move to the liberal end and permit several 
(less than k — 1) ¢ tests per experiment. Any- 
one who opposes this recommendation should 
also reject the running of separate Fs on main 
effects and interactions in the factorial analysis 
of variance, since the latter is a special case of 
several orthogonal tests. Norton and Bulgren’s 
results? suggest that running ¢(c<k — 1) 
à priori nonorthogonal /s has a. lower experi- 
mentwise risk of a Type I error than does the 
running of c orthogonal contrasts. Their results 
also suggest that using the common error term, 
M Sz, in all tests does not increase the rate of 
Type I errors above the theoretical rate for 
independent tests 1 — (1 — a)* (at least for 
df, > 40). 

Tn contrast, before investing extensive labor 
on à post hoc difference not anticipated by 
prior theory, it would be advisable to be wel] 
Protected against Type I errors. Scheffé’s test 
is well designed for post hoc data snooping, and 
is recommended when an experimenter Wishes 
to combine three or more groups in a com- 
plex post hoc contrast Suggested by the data. 
In short, in multiple comparisons, as in other 
versions of hypothesis testing, Statisticians who 
view the process as a decision-making technique 
will allow the relative risks of a Type I and 
Type II error to vary with their assessment 

* Norton, W., and Bul 
of certain multiple Comparison Procedures, Paper pre- 


sented at the meetin 
E of the American F, ati 
Research Association, February: 1965. adie 


gren, W.G, A sampling study 
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TYPE I AND TYPE II ERRORS IN MULTIPLE COMPARISONS 


of the consequences of each type of error. Ex- 
perimenters are better advised to match the 
test to the situation. 

Petrinovitch and Hardyck stated (on the 
equal 1 case): “With unequal variances the 
general result is an even greater loss of power 
when the samples are small [pp. 49-50].” This 
result is an artifact due to their method of in- 
creasing variances. Treating their initial vari- 
ances as c? their unequal variance populations 
had a mean variance of 7o2. They thus effec- 
tively decreased the distance between the 
means relative to the mean square error. Their 
summary of the effects of unequal zs combined 
with unequal variances suffers from the same 
error. With equal zs, the work of Box (1954), 
and Winkler (1968) assures us that the analysis 
of variance works quite well, but we may have 
to worry about effects on extreme pairs of 
means in the multiple-comparison tests. For 
the case of unequal 7s positively correlated 
(across populations) with unequal population 
variances, then the estimate of MS itin cells 
will tend to be too large, the rate of Type I 
error will be lower than alpha, and power will 
be similarly depressed. However, if the corre- 
lation between s; and c? is negative, then 
MS iin ciis Will tend to be too small, and both 
the risk of Type I errors and power will be 
artificially inflated. 

Although Petrinovitch and Hardyck pro- 
vide some evidence that the use of qo with us 
as discrepant as 5 and 15 still produces rea- 
sonable experimentwise and per comparison 
error rates, there is little to recommend the 
practice. In their results, the rates averaged 
over all contrasts are balanced by too many 
Type I errors on the i = 5 comparisons and 
too few on the n = 15, thus leading to syste- 
matic error. All of the tests may be converted to 
a t form that permits unequal 7s and an accu- 
rate estimate of the standard error of the differ- 
ence between the two means. A significant out- 
come may be represented by 


X,— Xj 
[i = I 
Nus js +) 
ni nj 
where CV is the critical value for the qo sta- 


tistic. The author has suggested this to recent 
classes, and à search of the literature revealed 


> CV/N2 
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it was not a unique idea. Kramer (1956) sug- 
gested this for the N-K and Duncan’s tests, 
and Steel and Torrie (1960) suggested it for 
the WSD. Again the crucial question is whether 
the critical values for > 2 remain accurate. 
However, even sizable discrepencies should 
have less serious consequences than ignoring 
the differences between the 1/n; + 1/n; terms 
in the denominator for various i and j. For ns 
of 5 and 15 this is ignoring the difference be- 
tween 6/15, 4/15, and 2/15. 
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REPLY TO WILCOCK ON GENE ACTION AND BEHAVIOR * 


D. D. THIESSEN ? 


University of Texas 


A thorough review of single-gene studies of behavior was recently published 
by Wilcock. His evaluation is that most studies are "trivial" in that they 
concentrate on obvious peripheral mechanisms without deep psychological 
meaning. The evaluation neglects that peripheral mechanisms often account 
for much of the normal range of variation, that single genes can be used to 
preset physiological parameters, and that simple explanations can prevent 


premature neurologizing. 


Wilcock (1969) has recently published a 
thorough review of single-gene action and be- 
havior. Many published as well as unpub- 
lished studies are ably critiqued, and it is easy 
to agree with him that the research gains in 
the area have not been spectacular. Perhaps, 
though, Wilcock was a bit hasty when he 
concluded that the behavioral differences in- 
vestigated thus far do not merit further atten- 
tion, and when he dismissed genetic varia- 
tion attributable to peripheral mechanisms 
(eig, retinal degeneration, skin thickness, and 
body weight) as “trivial” and not offering 
mechanisms of deep psychological meaning. 

"Three examples cited by Wilcock give the 
flavor of his thinking. First, Lockard (1964) 
pointed out that albino rats (cc) avoid light 
at lower intensities than pigmented rats (Cc 
or CC). Wilcock pointed out that the photo- 
phobic response to light is predictable on the 
basis of pigment loss in the iris and retina 
and is thus trivial pleiotropism. Second, 
Herter (1938) and Herter and Sgonina 
(1938) demonstrated that thermal preferences 
in two lines of mice could be related to a 
single gene which apparently acts indirectly 
through changes in skin thickness and hair 
density. Wilcock sees this as one of the 
clearest examples of trivial pleiotropism. 
Third, Thiessen (1965) related locomotor 
difficulties in the wabbler lethal mouse (wl) 
to the developmental course of myelin de- 
generation in the central nervous system. For 
Wilcock the behavioral variation is accepted 


iSupported by Grant MH 14076-02 and by Re- 
search Development Award MH 11, 174-02 from the 
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Thiessen, Department of Psychology, University of 
Texas at Austin, Austin, Texas 78712. 


as an inevitable consequence of brain dis- 
turbance and is thus merely an indirect 
measure. 

Tn all three examples the behavior and gene 
action are closely tied and in retrospect may 
have been predicted in advance. This in fact 
is probably not true, and in any case it is 
doubtful that structure-function relations 
should be ignored in favor of genetic varia- 
tion that affects behavior through unknown 
(and more central?) pathways. One gets the 
impression from reading Wilcock that (a) 
major gene effects are only valuable if they 
work within the normal range of variation; 
(5) gross behavioral variations tell us nothing 
about normal behavior; and (c) a psycholog- 
ical function is only important if it cannot be 
explained by pointing to peripheral mechan- 
isms. On the contrary, I would like to sug- 
gest that it is often the obviousness of a be- 
havioral and physiological change due to a 
major gene substitution that allows us to 
penetrate into the normal workings of a 
system and prevents us from falling into the 
trap that we have discovered something sig- 
nificant only if we cannot reveal structure- 
function relations. Illustrations not referred 
to by Wilcock may help clarify this point. 

Most species of Drosophila show a photo- 
tactic response toward light, especially when 
agitated and under pressure to escape preda- 
tion (Lewontin, 1959; Spassky & Dobzhan- 
sky, 1967). A great deal of genetic variation 
is present, however, and some strains are 
naturally photonegative or can be selected 
for that trait (Hirsch, 1967; Pittendrigh, 
1958). Single-gene and chromosome studies 
have revealed that the mediating mechanism 
in part responsible for flight toward light i: 
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the number of ommatidia (eye facets) stimu- 
lated by light. Brown and Hall (1936) found 
in different genotypes that the rank order 
relation of light-positive responses and com- 
plexity of visual receptors is red full > white 
full > white bar > red bar. The bar genes in 
particular reduce eye size and photopositive 
reactions. When eye size and hence number 
of ommatidia are plotted against threshold 
sensitivity to light the plot is linear, indicat- 
ing a close relation between the number of 
receptors in the eye and the tendency to re- 
The selection 
for escape responses toward light has ap- 
parently succeeded because of the genetic op- 
portunities latent in the visual system. Tt 
is difficult to view this Structure-function 
relationship as trivial and without meaning 


A second example, also involving the visual 
System, shows how unintelligible genetic find- 
ings may reduce to obvious explanations when 
the underlying mechanism is known. Albino 
rats of various strains are deficient in inter- 


out by Lund (1965) that albino rats as op- 
posed to pigmented 
optic fibers crossing 
The loss of information because of a de- 
ficiency of crossover routes precludes much 
interocular transfer among albino strains, 
Certainly the genetic difference, however 
gross, now has a sufficient explanation at the 
Structural level, which it never had before, 
The explanation will now generalize to situa. 
tions wherever the optic differences appear, 
A third illustration emphasizes how physi- 
ological analyses of a single-gene substitution 
may provide simple descriptions of other- 
wise obscure Phenotypes. A recent examina- 
tion of several Strains of mice by Bovet, 
Bovet-Nitti, and Oliverio (1969) pointed up 
the difficulty that the C3H strain has in 
Shock-avoidance shuttle responses using the 


onset of light as a conditioning stimulus. 


mice, C3H mice never perform above chance. 


D. D. THIESSEN 


but deemphasize as important, is that nearly 
all C3H strains examined, at least one form 
of CBA mouse, and severa] Swiss mice (all 
“cognitively” deficient in regard to avoidance 
conditioning) suffer from retinal degeneration 
(rd/rd) and probably cannot see the light 
Stimulus that precedes the onset of shock 


channels and enter the visual system directly 
through the skull (Ganong, Shepherd, Wall, 
Van Brunt, & Clegg, 1963). Tt is hardly sur- 
prising, therefore, that Some strains do rela. 
tively poorly when a discrimination depends 
9n the perception of light. The findings may 
be trivial in Wilcock's Sense, but at the very 


visual cortex (Terry, Roland, & Race, 1962). 
Any task which depends on the integrity of 
units would be affected. Paren- 
thetically, for problems in which visual sensi- 
tivity is not a prerequisite of learning and in 
which the cortex plays little or no part (eg, 


Thus, it would seem that major 
Stitutions can Provide usefu] ; 
about control mechanisms of behavior Within 
(and without) the normal range of varia- 
tion. For one thing, they can be used to 
Preset Physiologica] Parameters Without phys- 
(1956) anus, 6 Prtanism, Brown and Hall's 


recently, in the m 
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GENE ACTION AND BEHAVIOR 


alleles at the C locus which differ in the 
activity of the enzyme tyrosinase. Our data 
indicate some relationship between the en- 
zyme level present and the kind and degree 
of behavior affected. Single-gene screening is 
an expedient method of evaluating a primary 
physiological response. Even if we later find 
that the effects are pleiotropic to other physi- 
ogical and morphological processes we will 
not be disappointed, as any gene effect on be- 
havior is ultimately pleiotropic. 

Single genes can also provide simple ex- 
planations for behaviors that otherwise would 
receive overinterpretation. Obviously, not all 
explanations will be as clear-cut as those 
described above, and even these are doubt- 
lessly incomplete. However, we might as well 
account for as many as possible before 
proceeding to more difficult problems. In my 
opinion, describing any significant part of 
behavioral variation, however trivial in retro- 
spect, is a worthwhile enterprise. Explana- 
tions of greater psychological meaning may 
be a myth and reflect an outworn tradition 
of continually pushing explanations into the 
deep recesses of the central nervous system 
where they can neither be verified nor re- 
futed. Mechanisms of behavioral organization 
may be enfolded like layers of an onion, and 
when they are sequentially peeled away little 
variation may be left for speculation. 

T certainly agree with Wilcock that past 
efforts have not been shining examples of re- 
search methodology or judicious choice of 
research material. Nevertheless, no research 
strategy should be discarded so early in the 
game without an adequate trial. Major genes 
with great impact on the phenotype may in 
fact be keystones for species-specific be- 
haviors and evolutionary advance (Rendel, 
1967). 
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AND BEHAVIOR: 


A CLARIFICATION : 


JOHN WILCOCK 2 


University of Birmingham, England 


In criticizing my review of single-gene studies of be 


number of methodological points. It 


Paper was intended not as a contribution to methodology 


of empirical findings and their implici 


In a reply to my review of single-gene 
action and behavior (Wilcock, 1969), Thies- 
sen (1971) accuses me of being hasty in con- 
cluding that behavioral differences so far 
attributable to single-gene differences do not 
merit further attention. Though I do not 
think the conclusion was hasty, I do think 
there may be some misunderstanding of what 
is meant here by "further attention." The 
point I was trving to make is that the ex- 
planation for the majority of findings is al- 
ready at hand, and no further investigation is 
necessary to account for the differences found. 
That is to Say, the differences were built into 


the experimental designs and procedures, T 


nature of their results, 

This is not intended to 
findings or that the results of other similar 
single-gene studies could have no value, 
though my use of the emotionally charged 
word “trivial” may have been unhelpful. My 
main concern was a current evaluation of the 
single-gene literature, and though I did make 
tentative suggestions about possible single- 
Bene contributions in future, T refrained from 
evaluating single-gene analysis as a method. 
Any apparent difference between my position 
and that of Thiessen probably reduces toa 
matter of emphasis, My objection is not that 
such experiments should be carried out, but to 
how or why they have been carried out up to 
the present time, 


imply that the 


1 Supported by a research grant from the (British) 
Medical Research Council to p. L. Broadhurst and 


is now 


v, but as a critique 
it or explicit interpretations, 


In my view, 
retinal degeneration (rd) and mice without 


used successfully by 
in an investigation of ingestive be- 
havior in obese (0b), viable yellow (4*7), 
and and diabetic (27) mice. Thiessen (1965) 
himself has also used this approach, and 
though I expressed reservations about the 
work, my objections were certainly not 
methodological, 


Clarification of my own position regarding 


single-gene analysis may perhaps be most 
simply achieved by taking up some of the 
points made by Thiessen. From reading my 
Paper, Thiessen gained the impression that 
I believe that 


(a) major gene affects 


are only valuable if they 
work within the norm 


al range of variation; (b) 
gross behavioral variations tell us nothing about 
normal behavior; and (c) a psychological function 
is important only if it cannot be explained by point- 
ing to peripheral mechanisms [1971, p. 103]. 


fault may lie 
analysis, in- 


1 to pigmentation alleles, 
because such genes lie “within the usual 


range of variation [Thiessen, 1966, p. 901].” 


3 Fuller, d. Genes 
behaviora] mechanisms, 
annual meeting of the Am 
sociation, Washington, D. 
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A REJOINDER TO THIESSEN 


The majority of alleles which I discussed are 
abnormal in the sense that they do not occur 
with appreciable frequency in natural popula- 
tions. I would agree however, that for certain 
purposes this may be of no consequence. A 
distinction made by Fuller (see Footnote 3) 
may clarify the point. If one is interested in 
the heritability of traits or the evolutionary 
consequences of genetic innovation, both major 
objectives of psychogenetics, then the fre- 
quency of a gene in natural populations is 
crucial. On the other hand where a gene can 
be regarded as analogous to a surgical or 
chemical treatment, in a kind of physiological 
behavior genetics intended to “preset physi- 
ological parameters without physical insult to 
the organism" as Thiessen suggests, then the 
issue is immaterial. Some reservations about 
regarding single genes as treatments in this 
sense, however, should be expressed. Such 
treatments must be considered in develop- 
mental perspective. For example, an rd mouse 
is not a normal mouse with enucleated eyes, 
nor is an albino a normal mouse with unpig- 
mented skin. These deficiencies have a de- 
velopmental history in which background 
genic interaction and genotype-environment 
interaction may have compensated for the de- 
fects in particular ways. Thus, the spatial 
ability of congenitally blind people may be 
quite different from that of people blinded in 
later life (Fisher, 1964). 

As for the second point, that gross be- 
havioral variations tell us nothing about nor- 
mal behavior, I could not, of course, maintain 
such a position in general. Nevertheless, in 
the context of my review, I still maintain 
that single-gene analysis has made little con- 
tribution to an understanding of normal be- 
havioral variation in such psychologically 
interesting traits as escape avoidance con- 
ditioning, open field “emotionality,” and 
water escape. 

The third point concerning the importance 
of peripheral mechanisms does les than 
justice to the argument I was trying to de- 
velop. My thesis was that the majority of 
findings could be explained by peripheral 
mechanisms instead of by the complex psy- 
chological factors implied or explicit in the 
arguments of the investigator. Peripheral 
mechanisms may well play a role in normal 
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behavioral variation. For example, in recent 
work on interocular transfer problems Chor- 
over and Chase (1968) found that, because 
of the absence of a normal iris or sclera in 
albino rats, contact lenses intended to be 
opaque did not completely occlude vision. 
This suggests that albino eyes are sensitive to 
stimuli in a range outside the periphery of 
pigmented eyes. Chance? has pointed out 
that in social encounters, which he has in- 
vestigated extensively in laboratory rats 
(Grant & Chance, 1958), albinos respond to 
the movements of opponents over a wider 
range than do pigmented rats. Such a factor 
could conceivably confer an advantage on 
albinos in competitive situations. If this 
mechanism were to maintain albinism as a 
balanced polymorphism in natural popula- 
tions, there would be no question but that, 
in spite of the simple structure-function 
relationship, the finding was of great interest 
and importance. The only work involving 
balanced polymorphism in psychogenetics ap- 
pears to be that on the alleles (R£/rt) in the 
moth Ephestia, referred to in several places 
by Caspari (1958, 1967). So far, however, 
no detailed account of relevant behavioral 
observations appears to have been published. 

Thiessen’s chief proposition appears to be 
that the obviousness of behavioral physiolog- 
ical relationships offers insights into the nor- 
mal working of such systems and parsimoni- 
ous explanations for them. To reinforce his 
argument, he discusses papers from the poly- 
genic literature, including a study of inter- 
ocular transfer in rats by Sheridan (1965) 
and an analysis of genotype-environment 
interaction in escape avoidance conditioning 
in mice by Bovet, Bovet-Nitti, and Oliverio 
(1969). I agree that the subsequent work of 
Lund (1965) on uncrossed visual pathways in 
rats now offers a parsimonious explanation 
of Sheridan's findings. It is in the nature of 
psychological enquiry that our current ideas 
and theories must be modified in the light of 
subsequent findings. The explanation of the 
results of the experiments of Bovet et al. in 
terms of the rd allele, on the other hand, 
does not strike me as more parsimonious: it 
merely indicates little need for the experi- 


4M. R. A. Chance, personal communication, 1969. 
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ments in the first place. Information on the 
genetic load of most inbred mouse strains, if 
not wi ely known, is at least widely published 
and easily accessible. To call such mice “cog- 
nitively deficient” in avoidance conditioning 
on this evidence certainly would be premature 
theorizing. It is to the premature theorizing, 
or more specifically, to the fact that all avail. 
able information has not been fully con- 
sidered, that my major criticism has been 
aimed. 
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QUANTITATIVE VERSUS QUALITATIVE RNA AND 
PROTEIN CHANGES IN THE BRAIN 
DURING BEHAVIOR ' 


JOHN GAITO? axp KENNETH BONNET 


York Universit y 


Research purporting to demonstrate the occurrence of quantitative or qualita- 
tive changes in ribonucleic acid (RNA) and/or protein during simple and 
complex behavior is reviewed. Limitations in interpretation of research results 


and methodological problems are d 


iscussed. Much research indicates that 


quantitative changes do occur. In general, increments tend to result with mild 


or moderate stimulation; decrements, 


with drastic or prolonged stimulation. 


However, there is no conclusive evidence to indicate that qualitatively different 


RNA and/or protein species are synth 


esized during behavior. The necessity for 


research involving methods sensitive to the detection of qualitative changes is 


suggested. 


In recent years tremendous advances have 
occurred in the area of biology (molecular 
biology) emphasizing the molecular aspects 
of organisms. These events influenced a num- 
ber of behavioral scientists and led to the de- 
velopment of an area of research which has 
been called Molecular Psychobiology (Gaito, 
1966), Molecular Neurobiology (Schmitt, 
1967), or Molecular Psychology (Moore & 
Mahler, 1965). This area of research has been 
concerned mainly with the role of ribonucleic 
acid (RNA) and protein in behavior, with an 
emphasis on learning events. 

Approximately 20 years ago (at a time 
when protein molecules were considered to be 
the genetic substance) a number of individ- 
uals suggested the possible involvement of 
proteins in learning events (Gerard, 1953; 
Halstead, 1951; Katz & Halstead, 1950). 
Halstead developed the first systematic set 
of hypotheses in which nucleoproteins were 
modified as a result of experience to provide 
“memory traces.” Consistent with these spec- 
ulative hypotheses were some of the results 
reviewed by Ungar and Irwin (1968) dem- 
onstrating rapid reversible changes in neural 
protein configuration during neural stimula- 
tion. 


1The preparation of this manuscript was fa- 
cilitated by grants from the United States Office of 
Naval Research and the National Research Council 
(Canada). 

2 Requests for reprints should be sent to John 
Gaito, Department of Psychology, York University, 
Toronto, Ontario, Canada. 


In recent years in the research and the- 
orizing of Hydén and others, the position has 
been advanced that during behavior, espe- 
cially learning, RNA and protein play an 
important primary role. There are at least 
two important assumptions within this posi- 
tion: (a) during the behavior of concern, 
qualitatively different RNA and protein mole- 
cules are synthesized which are unique in the 
history of the neural tissue involved or are, 
at least, unique to the behavior; (b) these 
molecules have a primary role in mediating 
the production of behavioral responses to the 
provokiig stimulus (i.e., the presence of these 
molecules does not indicate merely a sec- 
ondary or correlative effect). Assumption b 
is pertinent and important, but unfortunately 
one which cannot be confirmed or refuted 
due to the grossness of the approaches ap- 
plied to this problem to date. On the other 
hand, Assumption a can be assessed at this 
time, Careful analysis of the research litera- 
ture indicates no conclusive evidence for this 
assumption. The purpose of this paper is to 
review research concerned with quantitative 
and qualitative changes of RNA and proteins 
during behavior to evaluate the apparent evi- 
dence for qualitative changes. Because the 
literature is voluminous in specific areas of 
this review, and many studies are duplicates 
of others, only the papers from systematic 
research programs and those presenting a 
major or unique point are cited. 
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QUANTITATIVE CHANGES 


Quantitative changes in RNA or protein 
within a given portion of neural tissue (e.g., 
amounts per gram of tissue or per cell) can 
occur primarily through intercellular ex- 
change of materials and intracellular syn- 
thesis and degradation. To determine the 
amount of synthesis of a specific substance, 
a radioactive precursor may be used. It is 
necessary to be cautious in the interpreta- 
tion of some radioisotope results. For ex- 
ample, autoradiography is a method in which 
the density of grains produced by radioactive 
disintegrations in a tissue slice containing 
labeled precursors is determined; this density 
is assumed to vary directly with the amount 
of labeled precursor incorporated and, thus, 
to indicate amount of synthesis. The use of 
this technique as an indicator of the presence 
or absence of synthesis sites is valuable; its 
use for quantitative purposes is less defini- 
tive. This procedure does not differentiate be- 
tween the amount of a substance, for in- 
stance, RNA, synthesized and the specific 
activity of the RNA (i.e., the amount of 
label in a specific amount of RNA). For ex- 
ample, a tissue under one condition may 
Show twice the grain density of the same 
tissue during another condition. One might 
infer that twice as much RNA has been syn- 
thesized in the first condition. It is possible, 
however, that the amounts of RNA synthe- 
sized in the two conditions is the same but 
that the amount of labeled precursor, for in- 
stance uridine-5-H?, incorporated is twice as 
great in the first condition. The greater spe- 
cific activity in the first condition could re- 
sult because cell membrane permeability was 
greater, allowing greater amounts of labeled 
precursor to enter the cell pool (labeled and 
unlabeled precursors). Even if the amount of 
labeled precursor entering the cell were the 
same in both conditions, greater specific ac- 
tivity could occur in the first condition be- 
cause of the presence of twice as many 
uridine sites in the synthesized RNA. 

Furthermore, if tritium (H?) labeled pre- 
cursors are utilized (which is usually done 
with autoradiography), there is the possibil- 
ity of accumulation of *noise" by hydrogen 
exchange phenomena, especially at the cell 
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membrane (i.e, significant amounts oí H? 
will replace hydrogen in cell membrane con- 
stituents). Thus, great care must be exer- 
cised in evaluating autoradiograms to dis- 
tinguish between “noise” and intracellular 
synthesis. 

The implication of the above is that for an 
investigator to provide conclusive evidence 
concerning amount of a substance, for in- 
stance, RNA, synthesized in a specific be- 
havioral event, he must equate the specific 
activities and amounts of labeled precursors 
entering the cell or tissue. To do so, he needs 
to know (a) the amount of label in the RNA 
fraction, (5) the amount of RNA, (c) the 
amount of label in the tissue pool of RNA 
precursors, and (d) the amount of labeled 
and nonlabeled R JA precursors (ribonucleo- 
tides) in the tissue pool. With this informa- 
tion he can determine the Specific activity 
of RNA (cpm/mg of RNA), and the specific 
activity of the cell pool (cpm/mg of ribo- 
nucleotides). The ratio of these two, the rela- 
tive specific activity, is a good indicator of 
Synthesis because it considers the amount of 
RNA and the amount of precursor avail- 
able. The same comments apply to protein or 
any other chemical of concern. Thus auto- 
radiography results, or those obtained from 
any method which does not provide the above 
information, must be viewed with caution 
when referring to amounts synthesized, 


Simple Behavior 
Sensory stimulation. 
view, Pevzner (1966) su 
of studies utilizing sound stimulation, elec- 
trical stimulation, vestibular stimulation, and 


a number of other Sensory events, He con. 
cluded that quantitative changes 


In an excellent re- 
mmarized the results 


A increases, 
derate stim- 


l, Chopra, 


and  D'Monte 
monkeys k, 


ept in dark for 
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2 hours showed greater incorporation of 
radioisotope precursor in the RNA from the 
occipital cortex than did monkeys who were 
subjected to flickering light for the same time 
period. In another experiment in which rab- 
bits were retained in a dark room for 3 weeks 
before being stimulated by flickering light 
for 2 hours, they reported the occurrence of 
greater protein synthesis in the occipital 
cortex of stimulated animals than for rabbits 
who remained only in the dark for 3 weeks. 

Kogan (1964) investigated the effect of 
light flashes on RNA content. He found an 
increase in RNA content of cortical cells 
when electrical potentials showed excitation 
of cells in the visual cortex. Redistribution of 
RNA from one part of the dendrites to an- 
other was observed in some cases. Those cells 
which did not respond to light flashes did 
not show an RNA increase. 

Gaito, Mottin, and Davison (1968a) found 
that the relative amounts of RNA and pro- 
tein, indicated by RNA/DNA (deoxyribo- 
nucleic acid) and protein/DNA ratios, in- 
creased with 10 and 20 minutes of aural 
stimulation in the auditory area of the rat 
cortex but were back to control levels at 30 
minutes of stimulation. 

Metzger, Cuenod, Grynbaum, and Waelsch 
(1967) used split-brain monkeys stimulated 
by monocular or binocular light flashes paired 
with shocks. The relative rates of protein 
synthesis in cell nuclei, synaptic endings, 
mitochondria, and microsomes in occipital 
and temporal lobes were the same as in the 
contralateral brain regions. 

The effects of drastic stimulation generally 
has been to reduce RNA synthesis and con- 
tent, Noach, Bunk, and Wijling (1962) used 
electroconvulsive shock with rats and sacri- 
ficed them 30 seconds or 75 seconds later. 
No effect was observed at 30 seconds; at 75 
seconds the RNA content of shocked animals 
was less than for nonshocked rats. Mihailovic, 
Jankovic, Petkovic, and Isakovic (1958) 
sacrificed rats one minute aíter the last con- 
vulsion following electroconvulsive shock. 
Decrements occurred in all central nervous 
system tissue analyzed; the decreases varied 
from approximately 10% to 3096 with the 
greatest decrements occurring in frontal, tem- 
poral, and occipital cortices, hippocampus, 


cerebellar cortex, thalamus, and hypothala- 
mus. Talwar et al. (1966) reported severe 
clonic convulsions induced by metrazol re- 
sulted in a reduction in brain RNA synthesis 
and a decrement in RNA content. 

Altman (1966) used autoradiography to in- 
vestigate protein synthesis during visual stim- 
ulation and visual deprivation in chicks. He 
found no evidence to indicate that “activa- 
tion” of brain structures by stimulation, or 
lack of “activation” by deprivation, affected 
the rate of incorporation of radioactive leucine 
into proteins of cell systems associated with 
that particular function. 

These results by Altman tend to be incon- 
sistent with most of the other results cited 
above. The method of analysis was auto- 
radiography; thus these results should be 
viewed as suggestive rather than as conclu- 
sive evidence (see above cautions). 

The preceding studies, in general, appear 
to show that RNA and protein levels tend to 
increase with mild or moderate stimulation, 
whereas drastic or prolonged stimulation re- 
duces these levels. Such results seem logical 
because in the former case one would expect 
that anabolic processes predominate, whereas 
catabolism would be emphasized in the latter 
situation; RNA and protein synthesis are 
intimately involved in cell metabolism. 

Motor activity. In a forced motor-activity 
task, Gaito et al. (1968a) found that the 
relative amounts of RNA and protein in- 
creased by approximately 20% to 30% within 
20 minutes in the motor cortex of the rat 
and was back to normal levels for 30 minutes 
of motor activity. 

Hydén (1964) reported an interesting ex- 
periment with Caribbean Sea barracudas in- 
volving sustained activity. These animals 
were exhausted experimentally by 20 to 30 
minutes of motor activity (hard swimming), 
and then the motor cells in the spinal cord 
(the anterior horn) were dissected and RNA 
extracted. The amount of RNA (in micro- 
micrograms, pg, per cell) was as follows: 
control, 3,244; experimentals immediately 
after motor activity, 3,416; 1 hour later, 
3,637; 2 hours later, 3,658; 3 hours later, 
3,723; 4 hours later, 4,029; 5 hours later, 
4,019. Thus an increase in RNA occurred 
during and after the exhausting motor ac- 
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tivity. This result is not consistent with other 
work; usually RNA content increases during 
the early stages of activity but decreases 
when the animal is exhausted. One explana- 
tion could be the capability of this animal 
for high levels of sustained activity. 

Altman (1963) used the autoradiographic 
procedure to investigate protein uptake of 
tritiated leucine in nerve cells during exer- 
cise. Four groups of two rats were used. One 
group was unstimulated; the three other 
groups were exercised in a motor-driven wheel 
for 2 hours. One of these three groups was 
spun at 7 revolutions per minute (rpm) and 
injected intraperitoneally with tritiated leu- 
cine after exercise, The second group was 
spun at 7 rpm but injected during the exer- 
cise. The last group was rotated at 12 rpm 
and injected after exercise. The group ro- 
tated at 7 rpm and injected during exercise 
showed the greatest uptake of leucine in the 
motor neurons in the lumbar and cervical 
portion of the spinal cord. The other ex- 
ercised groups did not show a consistently 
greater uptake than did the nonstimulated 
animals. 

The uptake of radioactive leucine in 15 
brain and spinal cord regions in nonstimu- 
lated animals was compared to that in the 
rats exercised at 7 rpm and injected during 
exercise. The greatest increase for the exer- 
cised group was in the motor Structures (49% 
to 66%). The visual area of the posterior 
cortex showed a 47% increase. With some 
caution, one might conclude that the Altman 
results are suggestive of Breater protein syn- 
thesis during moderate exercise. 

These results tend to be partially con- 
sistent with the sensory stimulation research, 
in that increases in RNA occur during motor 
activity. Presumably if motor activity were 
continued for many hours in the above ex- 
periments, RNA and protein synthesis and 
content would decrease below the normal 
levels. 

Chemical stimulation. A number of chemi- 
cals have been reported to affect the RNA 
content, Egyhazi and Hydén (1961) found 
that administration of malononitrile to indi- 
viduals with certain psychic disorders in- 
creased the content of RNA and proteins in 
cells of the central nervous System. They in- 


dicated that the malononitrile action was due 
to the formation of a dimer of malononitrile, 
tricyanoaminopropene (TCP). This chemical 
hastens RNA synthesis; it has an antithyroid 
effect but causes no observable toxic effect if 
given in suitable amounts. They reported 
that small amounts of this compound caused 
an increase of 25% in the amounts of pro- 
teins and RNA in nerve cells and à decrease 
of 45% in glial RNA. Essman (1966) also 
reported that TCP produced a significant 
elevation of RNA levels in brain tissue and 
attenuated the effects of electroconvulsive 
shock. 

Magnesium pemoline was reported to stim- 
ulate RNA polymerases and thereby to in- 
crease the levels of nuclear RNA synthesis 
(Glasky & Simon, 1966; Simon & Glasky, 
1968). Other research has indicated that 
RNA synthesis is not enhanced (Morris, 
Aghajanian, & Bloom, 1967; Stein & Yellin, 
1967); Gaito, Davison, and Mottin (1968) 
found that the specific activity and the 
relative specific activity of the 


of the RNA and 
protein fractions in the brains of rats fed 
magnesium premoline were less than for con- 
trol rats; these results however, were ob- 
tained in an active shock avoidance task, and 
the conditioning aspects may have been re- 
sponsible for the decrease, 

Nasello and Izquierdo (1969) reported 
that 1 intraperitoneal injection or 10 daily 
injections of saline solution increased the 
RNA concentration in a number of rat brain 
sites. The authors Suggested that the se- 
quence of stimuli involving the experimenter 
prior to the injection may have been re- 
Sponsible for the increase. 

Bowman and Strobel (1969) injected 
labeled RNA precursors intraperitoneally or 
intraventricularly to determine incorporation 
rates during reversal training. Increased in. 
Corporation occurred in the brain with the 
intraperitoneal injections, but not With the 
intraventricular route, 

Other chemicals drastically reduce the syn- 
thesis of RNA and proteins, For 
actinomycin-D is an antibi 
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otic which binds 


Rabinowitz, & Reich, 


of behavioral experiments 
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ported up to 96% inhibition of cerebral RNA 
synthesis following administration of actino- 
mycin D (Barondes & Cohen, 1966a; Ba- 
rondes & Jarvik, 1964; Goldsmith, 1967). 

Puromycin is an antibiotic which inter- 
feres with protein synthesis by causing a 
premature release of incomplete peptides 
(Harbers, Domagk, & Muller, 1968). A num- 
ber of researchers have used this antibiotic 
and related ones (e.g. the heximides) to 
drastically curtail protein synthesis (in mice 
— Barondes & Cohen, 1968a; Flexner & 
Flexner, 1969; in goldfish—Agranoff & Davis, 
1968). Metrazol produced severe clonic con- 
vulsions and a reduction in RNA synthesis 
and content (Talwar et al., 1966). 

A number of experiments have been con- 
cerned with the effects of chemicals on learn- 
ing or memory, for instance, yeast. RNA 
(Cook, Davidson, Davis, Green, & Fellows, 
1963; Corson & Enesco, 1966; Wagner, Gard- 
ner, & Beatty, 1966); ribonuclease (Corning 
& John, 1961); 8-azaguanine (Chamberlain, 
Rothschild, & Gerard, 1963; Dingman & 
Sporn, 1961; Gerard, 1963); TCP (Brush, 
Davenport, & Polidora, 1966; Essman, 1966; 
Gurowitz, Gross, & George, 1968; Solyom & 
Gallay, 1966); magnesium pemoline (Beach 
& Kimble, 1967; Gaito et al., 1968; Gelfand, 
Clark, Herbert, Gelfand, & Holmes, 1968; 
Gurowitz, Lubar, Ain, & Gross, 1967; Her- 
bert, Gelfand, Clark, & Gelfand, 1968; Lu- 
bar, Boitano, Gurowitz, & Ain, 1967; Plot- 
nikoff, 1966a, 1966b, 1967; Smith, 1967). In 
most of these research efforts, however, chemi- 
cal analyses were not attempted. 

Considerable care must be exercised in in- 
terpreting the results of chemical studies. Al- 
though many researchers assume specificity 
of drug action sites, such result is probably 
rare or nonexistent. It is often assumed, for 
example, that actinomycin-D is a specific in- 
hibitor of RNA synthesis, while puromycin 
and cycloheximide are potent protein syn- 
thesis inhibitors. Yet these three drugs have 
the same net effect, at least in part, in the 
prevention of ribosome maturation; this in 
turn leads to little or no messenger RNA 
reaching the cytoplasm and a deficit in pro- 
tein synthesis. Thus, inhibitors of RNA and 
protein synthesis may have a number of sites 
of action but produce similar effects through 
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separate modes of action. On the other hand, 
many chemicals, for instance, puromycin, may 
produce diverse effects; some of these ef- 
fects may be of such basic nature as to dis- 
rupt many systems (see below). 


Complex Behavior—Learning 


Experiments by Hydén and others indicate 
a change in RNA and/or proteins during 
learning events, usually an increment. Hydén 
has used two types of learning tasks: a task 
requiring rats to balance on a wire to reach 
a platform where food was located, and the 
forcing of right-handed rats to use the left 
hand to obtain food. Increases in average 
amounts of RNA per cell were reported. In 
the first task the experimental rats had an 
average of 751 ppg whereas rotated controls 
and controls had an average of 722 and 683 
pug, respectively (Hydén & Egyhazi, 1962). 
In the second task, the difference was 27 pug 
per cell for cells in the “learning” side of the 
brain (dorsal cortex) as compared to 22 ppg 
per cell for cells in the contralateral sector 
(Hydén & Egyhazi, 1964). Hydén and Lange 
(1968) also reported increased protein syn- 
thesis in hippocampal cells during this latter 
task. 

Kandel and Spencer (1968) have pointed 
out the distinct advantages of utilizing closely 
defined neuronal regions for the determina- 
tion of biochemical changes, since whole brain 
preparations could severely mask any ef- 
fects if small. Caution must be exercised, 
however, in interpreting such data in terms of 
whole organism "learning" per se, in that 
such localized effects may not represent pos- 
sible concommitant changes in other brain 
structures; furthermore, these changes may 
result from increased metabolic demands in 
the primary projection area. 

Other experimenters have reported RNA 
changes during learning. For example, in a 
number of experiments using shock avoidance 
training with mice, Zemp, Wilson, and Glass- 
man (1967) reported increased incorporation 
of radioactive uridine into brain nuclei RNA, 
in brain ribosomes, and in polysomes of 
trained animals. The increase occurred in the 
upper brainstem and associated structures, 
but a decrease appeared in the cortex. Like- 
wise, Naselo and Izquierdo (1969) found 
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that shock avoidance conditioning resulted in 
an increase in the RNA concentration of the 
parietal cortex and in the dorsal hippocampus 
of rats. In the brain of rats, RNA and protein 
levels and synthesis increased during one- 
way active avoidance conditioning when the 
shock level was low, but decreased with a 
higher shock level (Gaito, Mottin, & Davison, 
1968b). 

Another shock avoidance study using au- 
toradiography with rats reported that in- 
creased incorporation of radioactive leucine 
occurred in the nuclei of hippocampus, 
entorhinal cortex, and septal tissues but not 
in other brain areas or in the liver (Beach, 
Emmens, Kimble, & Lickey, 1969). 

Bowman and Strobel (1969) trained rats 
in a Y maze to discriminate spatial cues for 
water. On reversal training these animals evi- 
denced a 25% increase in incorporation of 
labeled precursors in hippocampal RNA. 
Practice of reversals for 60 minutes abolished 
this increment and produced a decrease in 
RNA synthesis in basal ganglia. 

Dellweg, Gerner, and Wacker (1968) per- 
formed a learning experiment similar to the 
one by Hydén and Egyhazi (1962). They re- 
ported that learning produced a significant 
increase in the ribosomal RNA content of 
whole brain but not in transfer RNA con- 
tent. They found also an increased poly- 
ribosome content, which could be solely the 
result of sensory stimulation or altered cel- 
lular chemical environment (Appel, Davis, & 
Scott, 1967; Campagnani & Mahler, 1967). 

An interesting experiment with protozoa 
(Applewhite & Gardner, 1969) found that 
greater incorporation of radioactive uridine 
occurred within 1 minute in habituating sub- 
jects, but the differences in incorporation be- 
tween these subjects and controls had disap- 
peared by 4 minutes. 

Although there are a few exceptions, the 
above summary on simple and complex be- 
havior effects appears to be consistent with 
Pevzner's conclusion that with events involv- 
ing moderate stimulation, increases in RNA 
(and protein) are the usual result, whereas 
when the stimulation is extreme, decrements 
occur. One aspect which makes difficult any 
attempt to relate quantitative changes and 
behavioral events is the possibility that no 
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changes in quantities of RNA or protein may 
occur within cells during behavior, but that 
redistribution of these chemicals may be of 
importance. Cohen and Jacklet (1965) em- 
phasized that changes in the distribution and 
concentration of RNA in the cytoplasm of 
vertebrate central neurons were related to a 
variety of different functional states in these 
neurons. 


QUALITATIVE CHANGES 


In assessing the possibility of qualitative 
changes in RNA and/or protein, the problem 
becomes exceedingly difficult. Qualitative dif- 
ferences can be in one or more of four types 
of structures: 


1. Primary structure: the linear sequence 
of bases in RNA; the sequence of amino 
acids in proteins. 

2. Secondary structure: the hydrogen bond- 
ing across parts of RNA molecules; the di- 
sulfide or hydrogen bonds between different 
parts of the protein molecules. 

3. Tertiary structure: the overall configura- 
tion or conformation of the substance within 
the cell (e.g., coiling). 

4. Quaternary structure: the complexing of 
RNA or protein with each other or with 
other molecules within the cell environment 
(e.g., DNA-histone complex, ribosomes and 
polyribosomes). 


The tertiary and quarternary structures 
are difficult to evaluate because extraction 
procedures tend to modify the natural condi- 
tion of the molecules. Likewise, some pro- 
cedures may affect primary and secondary 
Structures as well. Most of the recent research 
and theorizing on the possibility of qualita- 
tive changes in RNA or protein have em- 
phasized changes in primary structure, 


Simple Behavior 


The experiments on simple behavior usually 
have not been concerned with qualitative 
aspects of changes in RNA and protein; the 
concern has been with increases or decreases 


qualitative aspects of both RNA and pro- 


(1966) used column 
arate the brain RNA 
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of rats into four fractions based on molecular 
characteristics. They found no differences in 
Fractions 1 and 4 for normal and convulsive 
rats; but Fraction 2 was greater for the 
latter animals, whereas Fraction 3 was greater 
for the former rats. They found that the 
major differences between monkeys kept in 
the dark for 2 hours and those subjected to 
flickering light for the same period was in 
Fraction 1; the proportion of this RNA from 
the occipital cortex seemed to increase in 
monkeys exposed to light. These results may 
be indicating only quantitative changes, how- 
ever. In later work Singh and Talwar (1969) 
attempted to identify the proteins synthe- 
sized in the visual cortex during light stimu- 
lation. They reported that there were a group 
of acidic low-molecular-weight proteins whose 
synthesis was stimulated during exposure of 
the monkey to flickering light. These results 
may suggest proteins of different. primary 
structure are synthesized during visual stimu- 
lation; such possibility would require that 
RNA of different primary structure would be 
present also. 

Rappoport and Daginawola (1968) re- 
ported base ratio changes in RNA of olíac- 
tory bulbs of intact or split-brain fish sub- 
jected to novel odors in normal sea water. 
These changes were observed up to 3 hours. 
Changes were reversed in nuclear RNA by 24 
hours if no further stimuli were presented. 
These results may indicate the synthesis of 
RNA of different primary structure than that 
of RNA synthesized during the control con- 
dition. Base ratio changes, however, may in- 
dicate only quantitative changes (see below). 


Complex Behavior 

There are five sets of experimental studies 
which have been assumed by researchers to 
be relevant to the appearance of qualitatively 
different RNA and/or protein during learn- 
ing. 

1. In a number of articles, Hydén has in- 
dicated that base changes occur in glial RNA 
and nuclear RNA of neurons, but not in 
neuronal cytoplasmic RNA, during the per- 
formance of two learning tasks: rats bal- 
ancing On à wire to reach food and rats 
forced to use the nonpreferred hand to reach 


food. The results of these experiments were 
summarized by Hydén and Lange (1965). 
Base analyses indicated that during the early 
portion (3 to 5 days) of the tasks, adenine 
and uracil predominated in the change in 
RNA. Later, increased amounts of guanine 
and cytosine were present. Hydén assumed 
that these base ratio changes indicate that 
the RNAs synthesized during these tasks 
differ in primary structure from those syn- 
thesized during control conditions. 

Hydén (1967) maintained that the base 
ratios of the RNA synthesized in the cortex 
of rats during the transfer of handedness was 
consistent. with that of the composition of 
messenger RNAs which would be required to 
code for proteins that are reported to be 
specific to the nervous system. These are the 
low-molecular-weight acidic proteins, denoted 
as S-100. These proteins are localized mainly 
in the cell bodies of the glia and to a lesser 
extent in the nuclei of neurons, but are not 
found in the cytoplasm of neurons. The S-100 
proteins show rapid turnover, reaching high- 
est specific activity in vivo within 30 minutes; 
the specific activity declines markedly within 
24 hours (Hydén & McEwen, 1966). 

In a more recent experiment with the 
handedness paradigm, Hydén and Lange 
(1968) used electrophoresis to separate vari- 
ous protein fractions. from hippocampus of 
rats. They found that the incorporation of 
radioactive leucine into two fast-moving frac- 
tions was greater in the trained animals. 
They interpreted these results as indicating 
greater protein synthesis. This interpretation, 
however, has been questioned (Bowman & 
Harding, 1969) ; the cautions suggested above 
concerning synthesis are appropriate here. 

One might question whether these tasks in- 
volve learning and also the appropriateness 
of the brain area from which the RNA was 
extracted (the vestibular nucleus in the first 
study; the anterior dorsal cortex in the hand- 
edness experiment) in determining RNA 
changes in learning; however, of most con- 
cern to the objective of this paper are the 
base changes. There seems to be the implica 
tion in this research that base changes ar 
unique in learning and occur within cells in 
volved in the learning event. There are 
number of studies, however, which questio 
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this implication because base changes occur 
in many situations. Egy! and Hydén 
(1961) noted that administration of TCP 
brought about a decrease of 6% in cytosine 
and a 25% decrease in guanine. Geiger (1957) 
reported that electrical stimulation of the 
Cerebral cortex for 30 seconds caused an in- 
crease in cytosine and adenine, whereas the 
amount of uracil and guanine remained con- 
Stant. Grampp and Edstrom (1963) found 
that a 6-hour excitation of certain cells re- 
sulted in an increase in the adenine/uracil 
ratio. Edstrom (1964) showed that marked 
changes occur in the adenine/guanine ratio 
in nerve fibers of the goldfish after transec- 
tion of the spinal cord. The most convincing 
evidence contradicting the importance of base 
changes in learning is the report by Hydén, 
Egyhazi, John, and Bartlett (1969) and John 
(1967) in which the base changes which ap- 
peared in planarians subjected to a condi- 
tioning procedure were the same as those re- 
ceiving pseudoconditioning, or in a direction 
Opposite to expectations. The training of the 
planarians was achieved in John's laboratory, 
and the biochemical analyses were performed 
by Hydén. 

Even if one grants the premise that there 
are base changes somewhat unique or specific 
to learning, the question still remains as to 
whether these are indicative of a qualitative 
change in RNA. The meaning of these 
changes is not clear, Two major alternatives 
are possible. A qualitative change may be 
occurring with RNA being modified to pro- 
duce a new molecular Species, new RNA 
species may be Synthesized, or RNA from 
certain cells may permeate other cells such 
that the neuronal RNA population contains 
new types of RNA. On the other hand, quan- 
titative changes may be involved, that is, the 
relative amounts of synthesis and/or degra- 
dation of the species of transfer, ribosomal, 
and messenger RNAs may be changing. The 
latter appears to be the most likely one. The 
reported base changes may be statistical arti- 
facts resulting from the pooling of hundreds 
of RNA species. As an example, assume, for 
convenience, that there are only two species 
of RNA, one containing 2095 adenine and 
the other, 25% adenine. Assume also that 
both are present in equal quantities during 
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normal conditions. If a base analysis were 
performed, a mean of 22.595 adenine would 
result. If under learning conditions, the first 
species was 10 times as plentiful as the sec- 
ond one, the mean percentage adenine would 
be 20.5. Thus with this simple example, one 
might interpret the change from 22.5% to 
20.5% adenine as a qualitative change. The 
problem of this type of artifact is more pro- 
nounced in the actual cellular condition with 
hundreds of RNA species present. 

2. In an interesting set of experiments in- 
volving 15-minute shock avoidance training 
with mice performed at the University of 
North Carolina, a number of results which 
are pertinent to the possibility of qualitative 
RNA changes have been found (Adair, Wil- 
son, & Glassman, 1968; Adair, Wilson, Zemp, 
& Glassman, 1968; Zemp, Wilson, Schles- 
inger, Boggan, & Glassman, 1966; Zemp et 
al., 1967). These experiments showed that 
more radioactive precursor was incorporated 
into the RNA extracted from brain nuclei, in 
brain ribosomes, and in polysomes of the 
trained animals, By appropriate controls they 
were able to indicate that the increased in- 
Corporation was not due to the effect of light, 
buzzer, shock, or handling which the mice 
received. To determine if the label was being 
incorporated into Species of RNA which 
might have a unique function in learning, the 
authors sedimented labeled nuclear and ribo- 
somal RNA in a sucrose gradient. The radio- 


mouse, 
but the gross pattern was the same for both; 


furthermore, the sedimentation pattern for 
the trained animals resembled those found 
after RNA synthesis had been stimulated by 
hydrocortisone in the liver or by estrogen in 
the uterus. These Sucrose gradient results are 
indicators of the molecular weight of RNA 
which cannot show qualitati 


1 lve differences 
directly, however; unique RNA Species in 
learning animals coy 


Id be of the same molecu- 
pecies in nonlearning ani- 


mals but could contain different base se- 


quences. 
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sociated structures; 4 small decrease of 
uridine incorporation took place in the cortex. 

Other experiments by Adair, Wilson, and 
Glassman (1968) were concerned with vari- 
ous behaviors involving the jump box of the 
shock avoidance task. In each case when the 
mouse learned to jump to the shelf in re- 
sponse to the conditioned stimulus, increased 
incorporation of radioactive uridine into brain 
polysomes occurred. If the behavioral experi- 
ence did not include avoidance learning, the 
increased incorporation did not take place. 

Thus, this series of shock avoidance ex- 
periments tend to show a relationship between 
the incorporation of labeled precursors of 
RNA and this type of learning. It is not 
clear from these experiments, however, 
whether the RNA being synthesized includes 
RNA for this learning task which is qualita- 
tively different. from RNA synthesized dur- 
ing other behavioral tasks. 

3. Another set of experiments which ap- 
peared at first. to offer possible value in de- 
termining if qualitatively different RNA oc- 
curs during learning used actinomycin-D. Tf 
unique qualitatively different RNA species 
are synthesized during learning, and this syn- 
thesis is prevented by actinomycin-D, then 
learning should not occur. The results, un- 
fortunately, are not conclusive. Some indi- 
viduals indicate impairment of learning (Ap- 
pel, 1965; Goldsmith, 1967; Meyerson, Krug- 
likov, & Kolomeitseva, 1965); others report 
no effect (Appel, 1965; Barondes & Cohen, 
1966a; Barondes & Jarvik, 1964; Goldsmith, 
1967; Landauer & Eldridge, 1966). There is 
even one report showing an improvement in 
behavior (Batkin, Woodward, Cole, & Hall, 
1966). Even if impairment of learning were 
demonstrated consistently, however, such re- 
sults would not show definitely that synthesis 
of unique RNA species was abolished. The 
effect could be a result of inhibition of quan- 
titative aspects of RNA, that is, a small 
amount of RNA might be required for learn- 
ing to occur. 

A major problem with these experiments is 
the gross effect of actinomycin-D. Because 
this antibiotic drastically reduces the amount 
of RNA synthesized in the exposed tissues, 
RNA required for normal cell maintenance is 
no longer synthesized, rendering most effects 
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of this drug quite general. Furthermore, ex- 
perimenters generally use lethal doses to ob- 
tain appropriate effects and must observe the 
animal in the period before he dies. Logically 
one would expect to find some decrements in 
general and specific behaviors, if sensitive 
indicators of behavior were used. Thus inter- 
pretations concerning specific behavioral or 
chemical effects are hazardous. 

4. If unique RNA species were synthesized 
during learning, unique protein species would 
be expected also. If this were the case, in- 
hibition of protein synthesis should affect 
learning or memory. 

Using bilateral injections of puromycin in 
temporal lobe areas of mice, Flexner, Flexner, 
and Stellar (1963) found impairment of 
memory for maze avoidance conditioning. 
They reported that injections, 1 day after 
training, involving the hippocampus and ad- 
jacent temporal cortices caused loss of mem- 
ory. Loss of memory following injections after 
12 days or longer required bilateral injections 
in ventricular, frontal, and temporal areas. 
Other results suggested that spread of the 
memory trace from the temporal area to wide 
areas of the cortex required 3 to 6 days. They 
also found that recent reversal learning was 
lost with bilateral injections in the hippo- 
campal-temporal areas, but more temporally 
remote learning was retained. Later work 
(Flexner, Flexner, de la Haba, & Roberts, 
1965: Flexner, Flexner, Roberts, & de la 
Haba, 1964) found similar impairment of 
memory when 80% or greater protein syn- 
thesis inhibition by puromycin occurred for 
at least 8 to 10 hours. 

Puromycin injected intracranially in gold- 
fish also produced impairment of memory for 
a shock avoidance response (Agranoff & 
Klinger, 1964). If the goldfish were over- 
trained, puromycin caused no impairment of 
memory. Contrary to the results of the Flex- 
ner group, Agranoff reported that puromycin 
injections beyond 90 minutes after the last 
learning trial had no effect on memory. The 
Flexner group found memory impairment with 
injections 1 to 3 days after learning had oc- 
curred. The differences in the results of the 
Flexner and Agranoff groups could lie in the 
different time parameters utilized, differences 
in brain complexity in fish and mice, and in 


different levels of learning employed. The 
former group used a criterion of 9 out of 10 
responses correct; fish in the Agranoff studies 
showed about 2 or 3 correct responses in 10 
trials during the last training trials. 

Other experiments reported by Davis and 
Agranoff (1966), Agranoff, Davis, and Brink 
(1966), and Agranoff and Davis (1968) sug- 
gested that protein synthesis was not re- 
quired for learning to occur, but that it ap- 
peared to be necessary for the maintenance 
of *memory traces." 

Barondes and Cohen (1966b) stated that 
inhibition of protein synthesis by puromycin 
injections in the temporal lobe area of mice 
had no effect on the acquisition of a shock 
avoidance response 5 hours later. However, 
45 minutes later retention was decreased by 
more than 50%. Later research by Cohen 
and Barondes (1967a) and by Barondes and 
Cohen (1968a) found similar results. How- 
ever, when they administered acetoxycyclo- 
heximide (a potent inhibitor of protein syn- 
thesis), there was no effect on acquisition nor 
retention up to 3 hours after training, but 
impairment in retention occurred after 6 
hours and later. This impairment occurred 
only when brief training was involved; with 
a longer training period, no impairment was 
observed (Cohen & Barondes, 1967a). They 
reported also that the heximide administered 
with puromycin reversed the effect of puro- 
mycin, that is, there was no effect on either 
acquisition or retention. Barondes and Cohen 
(1968b) found similar effects with cyclo- 
heximide. They found also that the post- 
training presentation of an arousal agent (un- 
conditioned stimulus presentation, ampheta- 
mine, or corticosteroid injection) given within 
3 hours of training to the heximide-treated 
animal attenuated the memory impairment. 

Flexner and Flexner (1966) obtained some- 
what similar results to Barondes and Cohen 
with acetoxycycloheximide, reporting a tran- 
sient impairment of retention 8 to 50 hours 
after training. They interpreted these results 
as indicating that this antibiotic had no effect 
on unique messenger RNAs, but that puro- 
mycin decreases their stability; as soon as the 
effect of acetoxycycloheximide on protein syn- 


thesis had dissipated, the messenger RNAs 
could then function for protein synthesis. 
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The results of more recent research caused 
Flexner and Flexner (1967) to discard their 
hypothesis that puromycin destroys memory 
as a consequence of destruction of unique 
messenger RNAs. They found that injections 
of saline for up to 2 months after treatment 
with puromycin brought about a restoration 
of memory. 

An interesting study by Oshima, Gorb- 
man, and Shimada (1969) utilized electro- 
encephalogram recordings of salmon olfactory 
nuclei to detect recognition of “homing” 
waters, a very specific form of animal mem- 
ory. Appropriate quantities of either puro- 
mycin, cycloheximide, or actinomycin-D were 
given. Animals were tested 4 to 7 hours later 
for recognition of home water. The salmon 
were incapable of recognizing home water 
under any of the three drugs. Implications 
were that continuing protein and RNA syn- 
thesis was necessary for memory “readout” 
to occur, as memory was again detectable in 
these fish 24 hours after drug administration. 
Alternatively, these results may indicate that 
memory “readout” is adversely affected by 
these drugs but becomes available when the 
drugs are no longer functional. The Flexner 
and Flexner results (1967) with saline in- 
jections and the Barondes and Cohen results 
(1968b) with arousal agents are consistent 
with this interpretation. 

That the effects of puromycin on memory 
may be an indirect one is indicated by a num- 
ber of research results, Cohen, Ervin, and 
Barondes (1966) checked the electrical pat- 
terns in the hippocampal region during the 
operation of puromycin and cycloheximide. 
With the former there was an attenuation of 
electrical patterns. The electrical activity of 
mice injected with cycloheximide was indis- 
tinguishable from activity of saline-injected 
mice. Thus they reasoned that the amnesic 
effect of puromycin involved other mecha- 
nisms than its effect on protein synthesis. 
Research results which are consistent with 
this interpretation have been provided by 
Avis and Carlton (1968). A marked decrease 
in the amplitude of hippocampal activity was 
produced by an injection of potassium chlo- 
ride 24 hours after learning; attending this 
diminution of electrical activity was a deficit 
in retention, Cohen and Barondes (1967b) 
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also found that puromycin increased the 
susceptibility of mice to seizures. 

Gambetti, Gonatas, and Flexner (1968) 
found that puromycin produced swelling of 
neural mitochondria whereas acetoxycyclo- 
hexamide did not. Puromycin with acetoxy- 
cycloheximide resulted in minimal swelling. 
The authors suggested that the deleterious 
effect of puromycin on memory was due to 
action on nerve cell membranes. Recently, 
Flexner and Flexner (1969) described results 
of a number of experiments as evidence for 
the memory block effect of puromycin apart 
from any physiological detriments to cell 
function which might contribute to general 
performance decrement. 

Other general cellular impairments due to 
puromycin have been reported by Appleman 
and Kemp (1966) and by Jones and Banks 
(1969). 'The former authors reported drastic 
decrements in energy metabolism which was 
independent of its effect on protein synthesis: 
the later group found decrements in tissue 
respiration. 

Studies using puromycin and other protein 
synthesis inhibitors encounter the same haz- 
ards as the studies with actinomycin-D; both 
RNA and protein synthesis are 50 intimately 
concerned in general cell functions as to 
render any conclusion about specific effects 
very tenuous. To attempt to disentangle a 
single specific effect from the overall general 
effect is like looking for a needle in a hay- 
stack. However, even if puromycin consist- 
ently showed a specific effect on memory, 
these events could not be interpreted defini- 
tively as indicating that synthesis of unique 
protein species was being inhibited. Again, a 
certain level of protein might be necessary for 
memory permanency. 

5. A set of experimental results which 
might be assumed to indicate the presence of 
qualitatively different RNA and/or protein 
molecules during learning is the result of the 
"transfer experiment." In this experimental 
paradigm, a group of animals are trained, 
their brains removed and homogenized, and 
the RNA from the brain (or the homogenate) 
is injected into naive animals. A similar pro- 
cedure is followed with a group of untrained 
animals. If a qualitatively different. molecule 
or molecular system develops during train- 


ing, one would expect that this component 
would produce facilitation of behavior in the 
animals injected. with material from trained 
animals but that no facilitation would occur 
for animals injected with material from un- 
trained animals. Although the transfer ex- 
periment presents an exciting possibility rela- 
tive to the question of qualitative changes, 
the research literature is quite confusing. 
Some investigators get a positive effect with 
various tasks (e.g. Babich, Jacobson, Bu- 
bash, & Jacobson, 1965; Fjerdingstad, Nis- 
sen, & Roigaard-Petersen, 1965, 1966; Jacob- 
son, Babich, Bubash, & Goren, 1966; Ungar 
& Oceguerra-Navarro, 1965). Others do not 
(e.g., Byrne et al., 1966; Luttges, Johnson, 
Buck, Holland, & McGaugh, 1966). 

A further problem with these experiments 
is that the results are not clear as to whether 
the effect is one of general activation (Halas, 
Bradfield, Sandlie, Theye, & Beardsley, 1966; 
Viney, Branch, & Gill, 1967) or one of a 
specific memory-related nature (Fjerdingstad 
et al., 1966). Related to this aspect is the 
tendency of the experimental- and control- 
injected groups to be similar on the first trial 
in many experiments with differences grad- 
ually appearing over time. This result may 
suggest that something is being transferred 
which improves the general condition of the 
cells and thereby improves performance. 

Furthermore, the results are not clear as to 
which molecule or molecular system is re- 
sponsible for a facilitative effect, if it occurs. 
Some maintain that RNA is responsible 
(Fjerdingstad et al., 1966; Jacobson et al., 
1966) or that facilitative effects are pro- 
duced by ribosomal RNA but not with other 
RNA fractions (Faiszt & Adam, 1968). 
Others maintain that polypeptides or proteins 
are responsible for the facilitation (Rosen- 
blatt, Farrow, & Rhine, 1966a, 1966b; Ungar 
& Cohen, 1965). 

Many of the experiments producing posi- 
tive results used injections outside the brain. 
A number of studies have indicated that little, 
if any, of material injected in this fashion 
reach the brain (Eist & Seal, 1965; Enesco, 
1966; Sved, 1965). Thus facilitation may be 
due to the effect of the injected material on 
other organs than the brain. 
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Alternatively, many of the observed spe- 
cificities of “transferred” behavior might 
arise as a result of injected materials which 
most frequently contain soluble protein, pep- 
tides, RNA, and DNA. Bohus and DeWied 
(1966) demonstrated the specific attenuation 
of extinction by elevated levels of adreno- 
corticotropic hormone, a peptidyl hormone 
which could be contained in many of the 
brain homogenates used to date. Similarly, 
Cook (1964) and Brown (1966) showed the 
characteristics of acquisition in animals given 
yeast RNA to include higher asymptotic re- 
sponse rate, reduced vicarious activity, and 
severe attenuation of extinction of a well 
learned task. These characteristics showed a 
marked similarity to some reported effects of 
brain homogenate injections. Another ex- 
ample would be the report by Peterson (1949) 
which showed change of handedness as a 
result of injecting acetylcholine into rats; 
Byrne and Samuel (1966) and Rosenblatt 
and Miller (1966) reported specific “trans- 
fers" of handedness via brain homogenates. 
While such results may be due in part to “in- 
formation" contained in the homogenate, the 
generality of such an effect must be sys- 
tematically controlled to allow an interpreta- 
tion of specificity of “information” transfer. 

Thus the “transfer experiment” paradigm 
does not lead consistently to the conclusion 
that a unique molecule or molecular system 
facilitates behavior. However, the incorpora- 
tion of other paradigms with that of the 
“transfer experiment” may help to eradicate 
some of the discrepancies which abound. In 
several ingenious experiments, Reinis has in- 
troduced actinomycin-D and puromycin in- 
jections into the transfer paradigm. In one 
experiment (Reinis, 1968), mice were in- 
jected intracranially with actinomycin-D just 
prior to an intraperitoneal injection of brain 
homogenate from trained mice, whereas others 
received a saline injection before the homog- 
enate. The latter mice showed the “transfer 
effect,” but the former did not. In a second 
experiment, Reinis (1969) injected some 
trained mice intracranially with puromycin; 
others received a saline injection. The brains 
of these mice were homogenized and homoge- 
nates were administered intraperitoneally to 
other trained mice, The acceptor mice receiv- 


ing a homogenate from mice who were in- 
jected with saline performed as well as dur- 
ing earlier training sessions; those receiving 
the homogenate from puromycin treated mice — 
performed like untrained mice. Clearly, fur- 
ther research with greater control and ap- 
propriate biochemical analyses are necessary 
before this line of research can have a direct 
bearing on memory mechanisms. 

The general conclusion to be derived from - 


the studies reported in this section is that T 


there is no set of experiments which con- 
sistently and conclusively indicates a qualita- 
tive molecular change during learning or other 
behavior. Thus it appears that quantitative 
changes occur in RNA and protein during 
behavioral events, but that further research 
is necessary before the question of qualita- 
tive changes can be answered. 
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The major problem in attempting to re- 
solve the question of possible qualitative 
molecular changes during behavior, espe- 
cially during learning, is that the techniques 
utilized so far have been too gross to answer 
this question. For an effect to be considered 
a qualitative effect during behavior, a number 
of criteria need to be satisfied: 


1. The effect must be demonstrated to be 
a function of one behavioral event and not of 
closely parallel behavioral events. 

2. The effect must be demonstrable in a 
class of molecules (or classes of molecules) 
found in the nervous tissue mediating the ob- 
served behavior. 

3. The effect must be a change which is 
clearly qualitative in nature (i.e., a modifica- 
tion in primary, secondary, tertiary, or quater- 
nary structure). This need not imply only 
changes in existing structure of molecules, 
but could imply altered gene expression as 
well, such that qualitatively new species of 
a class of molecules are present. This cri- 
terion is the one of major coucern for this 


paper. 


To date the most sensitive qualitative 
method of protein determination appears to 
be immunological. Unfortunately, this tool 
allows discrete detection of qualitatively dif- 
ferent molecules, but not their characteriza- 
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tion. However, this tool should be useful in 
the future. Many qualitative methods for 
protein characterization are available, al- 
though the frequently employed methods of 
electrophoresis and chromatography are qual- 
itative differentiators only when all species of 
protein in the sample are above detection 
threshold for the specific techniques. Thus 
"qualitative" effects demonstrated by one of 
these techniques could be reflecting a quan- 
titative increase in a protein or polypeptide 
normally in very low concentration, Tech- 
niques such as the use of antibiotics of anti- 
metabolites to selectively inhibit cell function 
or molecular synthesis can only provide gross 
answers, with no information as to possible 
qualitative changes accompanying behavioral 
events. 

In that differences in primary. structure of 
RNA and protein have been suggested or im- 
plied by most investigators, a technique is 
needed which will perform a sequence analysis 
of RNAs or proteins. Sequence analysis with 
short RNA molecules of approximately 75 
nucleotides have been successful. Holley, Ap- 
gar, Everett, Madison, Marquisse, Merrill, 
Penswick, and Zamir (1965) were able to 
purify three transfer RNAs from yeast which 
were specific for alanine, tyrosine, and valine. 
They established that the nucleotide sequence 
of each was different by careful analysis of 
breakdown products following ribonuclease 
(RNase) treatment. They determined the 
complete nucleotide sequence of the alanine 
transfer RNA; the complete nucleotide se- 
quence for two transfer RNAs for serine and 
one each for tyrosine and phenylalanine has 
been determined also (Harbers et al., 1968). 
Likewise, the amino acid sequence in some 
proteins is known (e.g. insulin, RNase). Un- 
fortunately, the determination of sequence 
for these short molecules necessitated long 
and painstaking analyses such as to preclude 
their use in uncovering the sequences in 
larger molecules which contain thousands or 
millions of units. 

There is no reliable technique to perform 
a sequence analysis and provide information 
as to the exact sequence in the molecule. 
There is one method, however, that can differ- 
entiate between RNA molecules of different 
base sequences; this is the DNA-RNA hybrid- 


ization procedure (Gillespie & Spiegelman, 
1965). This method has been employed fre- 
quently by molecular biologists in recent 
years, For example, Miyagi, Kohl, and Flick- 
inger (1967) used hybridization techniques 
successfully in differentiating RNA of liver 
and kidney and RNA of embryo and adult 
chickens. Stevenin, Samec, Jacob, and Mandel 
(1968) reported the approximate amounts of 
the fraction of the genome in rat brain which 
codes for ribosomal and messenger RNAs. 
Likewise, Bondy and Roberts (1968) found 
these procedures essential in their finding 
that a much greater proportion of RNA syn- 
thesis is oriented toward messenger RNA 
formation in brain than in other organs. The 
possible use of these procedures for behavioral 
work was suggested by Bonner (1966) and 
by Gaito (1966). 

Tf one heats a solution of brain DNA, for 
instance, from rats, at 95 degrees centigrade 
for 10 minutes, the double-stranded DNA will 
split into single strands. If this DNA is then 
poured onto nitrocellulose membranes, these 
membranes will “trap” single strands but will 
allow any double strands to pass through. 
If a membrane with attached DNA is placed 
in a solution of RNA, those RNA molecules 
which are complementary in base sequence to 
DNA sites will become firmly attached and 
be resistant to RNase treatment. If this 
DNA-RNA hybrid is put in another solution 
of the same RNA, no further hybridization 
will occur because all DNA sites comple- 
mentary to the RNA will be occupied. On the 
other hand, if this hybrid is added to a differ- 
ent solution of RNA which is complementary 
to other DNA sites, further hybridization will 
occur. 

Putting this procedure within a behavioral 
framework, the rationale is the following: If 
there exist unique species of brain RNA which 
are synthesized during behavior, for instance, 
learning, and RNA from the brain of a non- 
learning animal is hybridized with single- 
strand DNA, then when RNA from the brain 
of a learning animal is added to this hybrid, 
the unique RNA species should adhere to the 
DNA. An important aspect of this SUCCESSIVE 
competition hybridization procedure is that 
only the RNA from learning animals is la- 
beled. Thus the presence of a label in the 
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brain homogenate injections. Another ex- 
ample would be the report by Peterson ( 1949) 
which showed change of handedness as a 
result of injecting acetylcholine into rats; 
Byrne and Samuel (1966) and Rosenblatt 
and Miller (1966) reported specific “trans- 
fers" of handedness via brain homogenates, 
While such results may be due in part to “in- 
formation” contained in the homogenate, the 
generality of such an effect must be sys- 
tematically controlled to allow an interpreta- 
tion of specificity of "information? transfer. 

Thus the "transfer experiment? paradigm 
does not lead consistently to the conclusion 
that a unique molecule or molecular system 
facilitates behavior. However, the incorpora- 
tion of other paradigms with that of the 
"transfer experiment? may help to eradicate 
Some of the discrepancies which abound. In 
Several ingenious experiments, Reinis has in- 
troduced actinomycin-D and puromycin in- 
jections into the transfer paradigm. In one 
experiment (Reinis, 1968), mice were in- 
jected intracranially with actinomycin-D just 
prior to an intraperitoneal injection of brain 
homogenate from trained mice, whereas others 
received a saline injection before the homog- 
enate, The latter mice showed the “transfer 
effect," but the former did not. In a second 
experiment, Reinis (1969) injected some 
trained mice intracranially with puromycin; 
others received a saline injection, The brains 
of these mice were homogenized and homoge- 
nates were administered intraperitoneally to 
other trained mice, The acceptor mice receiv- 


Jenavioral events, but that further research 
is necessary before the question of qualita- 
tive changes can be answered. 
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The major problem in attempting to re- 
solve the question of possible qualitative 
molecular changes during behavior, espe- 
cially during learning, is that the techniques 
utilized so far have been too gross to answer 
this question, For an effect to be considered 
a qualitative effect during behavior, a number 
of criteria need to be satisfied: 


1. The effect must be demonstrated to be 
a function of one behavioral event and not of 
closely parallel behavioral events, 

2. The effect must be demonstrable in a 
class of molecules (or classes of molecules) 
found in the nervous tissue mediating the ob- 
served behavior, 

3. The effect must be a change which is 
clearly qualitative in nature (i.e., a modifica- 
tion in primary, secondary, tertiary, or quater- 
nary structure), This need not imply only 
changes in existing structure of molecules, 
but could imply altered gene expression as 
well, such that qualitatively new species of 
a class of molecules are present. This cri- 
terion is the one of major concern for this 


paper. 


To date the most sensitive qualitative 
method of protein determination appears to 
be immunological. Unfortunately, this tool 
allows discrete detection of qualitatively dif- 
ferent molecules, but not their characteriza- 


events. 

In that differences in primary. structure of 
RNA and protein have been suggested or im- 
plied by most investigators, a technique is 
needed which will perform a sequence analysis 
of RNAs or proteins. Sequence analysis with 
short RNA molecules of approximately 75 
nucleotides have been successful. Holley, Ap- 
gar, Everett, Madison, Marquisse, Merrill, 
Penswick, and Zamir (1965) were able to 
purify three transfer RNAs from yeast which 
were specific for alanine, tyrosine, and valine. 
They established that the nucleotide sequence 
of each was different by careful analysis of 
breakdown products following ribonuclease 
(RNase) treatment. They determined the 
complete nucleotide sequence of the alanine 
transfer RNA; the complete nucleotide se- 
quence for two transfer RNAs for serine and 
one each for tyrosine and phenylalanine has 
been determined also (Harbers et al., 1968). 
Likewise, the amino acid sequence in some 
proteins is known (e.g., insulin, RNase). Un- 
fortunately, the determination of sequence 
for these short molecules necessitated long 
and painstaking analyses such as to preclude 
their use in uncovering the sequences in 
larger molecules which contain thousands or 
millions of units. 

There is no reliable technique to perform 
a sequence analysis and provide information 
as to the exact sequence in the molecule. 
There is one method, however, that can differ- 
entiate between RNA molecules of different 
base sequences; this is the DNA-RNA hybrid- 


work was suggested by Bonner (1966) ana 
by Gaito (1966). 

If one heats a solution of brain DNA, for 
instance, from rats, at 95 degrees centigrade 
for 10 minutes, the double-stranded DNA will 
split into single strands. If this DNA is then 
poured onto nitrocellulose membranes, these 
membranes will “trap” single strands but will 
allow any double strands to pass through. 
If a membrane with attached DNA is placed 
in a solution of RNA, those RNA molecules 
which are complementary in base sequence to 
DNA sites will become firmly attached and 
be resistant to RNase treatment. If this 
DNA-RNA hybrid is put in another solution 
of the same RNA, no further hybridization 
will occur because all DNA sites comple- 
mentary to the RNA will be occupied. On the 
other hand, if this hybrid is added to a differ- 
ent solution of RNA which is complementary 
to other DNA sites, further hybridization will 
occur. 

Putting this procedure within a behavioral 
framework, the rationale is the following: If 
there exist unique species of brain RNA which 
are synthesized during behavior, for instance, 
learning, and RNA from the brain of a non- 
learning animal is hybridized with single- 
strand DNA, then when RNA from the brain 
of a learning animal is added to this hybrid, 
the unique RNA species should adhere to the 
DNA. An important aspect of this successive 
competition hybridization procedure is that 
only the RNA from learning animals is la- 
beled. Thus the presence of a label in the 
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twice hybridized DNA will suggest that RNA 
species not present in the brain of nonlearn- 
ing animals have been synthesized in learning 
animals during the task. 

At the present time a number of behavioral 
studies are underway in the Molecular Psy- 
chobiology Laboratory at York University in 
which double hybridization procedures, sup- 
plemented by appropriate single hybrids, are 
involved. Rapid progress has occurred in a 
shock avoidance study. In one experiment 
(Machlus & Gaito, 1968a) labeled RNA from 
avoidance conditioned rats competed with un- 
labeled RNA from nonbehaving control ani- 
mals. Label appeared consistently in the 
double hybrid, suggesting that the brains of 
learning animals contained RNA species qual- 
itatively different than those in the brains of 
nonlearning rats. Results with single hybrids 
for the two groups of rats, and with double 
hybrids in which the RNA from nonlearn- 
ing rats was labeled and the RNA from learn- 
ing animals was unlabeled, provided support 
for this conclusion. In a second experiment 
(Machlus & Gaito, 1968b) similar results oc- 
curred when brain RNA from shock avoid- 
ance trained rats competed with RNA from 
rats in a motor activity task, suggesting that 
the “qualitatively” different RNA species 
were not due to the motor aspects of the 
avoidance task. In two other experiments, 
the same results were obtained when brain 
RNA of shock avoidance animals competed 
with the RNA from yoked controls (Machlus 
& Gaito, 1969). A further experiment in- 
volved competition between liver RNA from 
shock avoidance rats and liver RNA from 
yoked controls; no label was detected in the 
double hybrid. The following is the para- 
digm? for estimating amounts of RNase 
resistant RNA in single and double hybrids: 

Hybrids 
1. DNA-RNAs4* 
2. DNA-RNAs* 
3. DNA-RNAs-RNAs4* 
4. DNA-RNAs-RNAg* 


?RNAs, RNA from shock avoidance animal; 
RNAs, RNA from nonlearning shocked animal; 
asterisk indicates the presence of labeled precursor 
in RNA. 
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The results of this series of studies suggest 
the synthesis of unique qualitatively different 
species of RNA in brain during this learning 
task, a conclusion which is consistent with 
that of Hydén and others who maintain that 
RNA has an important role in learning events. 
Other experiments, however, are being im- 
plemented to evaluate the possibility that the 
repeated results suggesting unique species are 
artifactual in nature, because the hybridiza- 
tion procedures are complex and present 
many possible pitfalls, Although the results 
are promising, at this time we hesitate to 
conclude that unique Species have been 
found; research within the next few years 
with the hybridization procedures should be 
more conclusive one way or the other. 

Work is in progress with a paradigm which 
provides more adequate control of motor, af- 
fective, hormonal, and biochemical aspects 
than does the shock avoidance experiments. 
This work utilizes a surgical split-brain prep- 
aration to allow for learning in a single 
hemisphere. The successive competition hy- 
bridization is of intra-animal nature, with the 
RNA from the “trained” hemisphere com- 
peting with RNA from the contralateral 
hemisphere. 


OVERALL CONCLUSIONS 
In summary, it appe: 


ars that we can con- 
clude the following: 


1. Quantitative changes in 
and content of RNA and proteir 
occur during behavior 
In general 


the synthesis 
n in the brain 
: (including learning). 
; increments tend to result with 
mild or moderate stimulation; decrements 
with drastic or prolonged stimulation. i 

2. In spite of much research, there is still 
no conclusive evidence to indicate th 
tatively different RNA and/or protei 
are synthesized during 
behavior. 

The results of the hybridization experiments 
Suggest possible unique RNA Species; 
ever, because of pitfalls with these Procedures 
much more work is required before a definite 
conclusion can be advanced. Further research 
with these techniques and the introduction of 


at quali- 
n species 
learning and other 


how- 


| 


PROTEIN CHANGES IN THE BRAIN 129 


other suitable procedures should allow con- 
clusions of greater specificity from molecular 
biological and biochemical approaches to be- 
havioral problems. 
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EXPERIMENTAL INDUCTIONS OF THE CONSERVATION OF 
“FIRST-ORDER” QUANTITATIVE INVARIANTS? 
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Michigan State University 


A review of experiments that have attempted to train Piaget’s concrete con- 


servations is presented. Pertinent theoretical and 


methodological issues are 


summarized. Empirical questions are posed concerning: (a) whether or not 
conservations are trainable; (b) what training methods are most effective; 
(c) whether or not specific transfer of training occurs; (d) whether or not 
nonspecific transfer of training occurs; (e) whether or not some conservations 
are more resistant to extinction than others; (f) whether or not “natural” 
conservers are more resistant to extinction than trained conservers. On the 


basis of currently available data, reasonably 
a, b, c, and e are advanced, and judg 
Finally, some of the more obvious dire 


Piaget’s theory of the ontogeny of thought 
(e.g., Piaget, 1950, 1952a, 1952b, 1968; 
Piaget & Inhelder, 1969) spurns any consid- 
eration of the knowledge of objects per se in 
favor of knowledge which pertains to reliable 
features of objects-in-general. In this respect, 
Piaget’s theory is similar to the early writings 
of Wittgenstein (1922). While Wittgenstein 
and the logical positivists refer to the “reliable 
features” of objects as “facts” (tatsachen), 
Piaget has chosen to introduce the notion of 
cognitive invariants or constancies to denote 
these same “facts.” ? 

Piaget holds (e.g. 1968) that subjects ac- 
quire one or more of the cognitive invariants 
during each of the four hypothesized stages 
of intellectual development (sensory-motor 
intelligence, preoperations, concrete opera- 
tions, formal operations). Piaget partitions 
this group of all possible cognitive invariants 
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These constancies, then, are object-invariants, 
and as such they are not to be confused with the 
Piagetian notion of functional invariance—the latter 
concept being exclusively applicable to the inherent 
adaptive tendencies of all organisms. 
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complete answers to Questions 


ment is reserved on Questions d and Fe 
ctions for future research are considered. 


into two mutually exclusive subsets, namely, 
qualitative and quantitative invariants, 

The cognitive invariants that may be 
termed qualitative include those properties of 
objects that are of an “all-or-none” or tyes- 
or-no” variety (e.g., existence-nonexistence, 
sex, identity, etc.). The acquisition of the 
qualitative invariants is said to be within the 
province of the first two stages of cognitive 
development (sensory-motor intelligence and 
preoperations). These qualitative advances 
are thought to include the belief that objects 
continue to exist when they are perceptually 
absent and the belief in a constant generic 
identity for all organisms. De Vries (1969) 
has reported evidence that supports Piaget’s 
(1968) contentions about the acquisition of 
one qualitative invariant (generic identity). 

The acquisition of the quantitative invari- 
ants is relegated to the final two stages of 
development (concrete operations and formal 
operations). The present paper is concerned 
with a specific group of Piaget's quantitative 
invariants, the so-called “first-order” quanti- 
tative invariants. The familiar conservation 
problems developed by Piaget and his col- 
laborators (e.g., Inhelder & Piaget, 1958; 
Piaget, 1952b; Piaget, Inhelder, & Szemin- 
ska, 1960) constitute the techniques employed 
to evaluate subjects! grasp of each of these 
invariant properties, 

This review considers a subset of the large 
number of "conservation studies" that have 
been published in recent years. Our particular 
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concern is with those studies reporting at- 
tempts to induce or “train” children's con- 
servation of certain quantitative invariants 
(number, length, substance, weight). 

The experimental attempts to induce con- 
servation apparently have been motivated by 
the fact that Piaget is concerned exclusively 
with the ontogeny of the “concepts of con- 
servation,” rather than with specific experi- 
ential variables that may facilitate conserva- 
tion. Given the development of such conserva- 
tion phenomena, it remains to ascertain just 
which experiential variables are influential in 
their construction. In numerous publications 
which refer to the subject of conservation, 
Piaget has employed “nonspecific” and/or 
structural features of the reasoning process 
as explanatory principles. Alternatively, there 
are other explanations for the origins of con- 
servation concepts which derive from a gen- 
eral learning theory approach (e.g., Gelman, 
1969; Kingsley & Hall, 1967; Smedslund, 
1961a; Trabasso, 1968). 


Empirical Issues 


Flavell (1963), Wallach (1963), and 
Kohlberg (1968) each have considered some 
of the questions about the ontogeny of con- 
servation concepts that recent experiments 
have been addressed to. An obvious initial 
question must be: (a) Are conservation con- 
cepts trainable? Flavell (1963), and more 
recently Mermelstein and Meyer (1969), 
have concluded—on the basis of what must be 
considered scanty evidence (e.g., Smedslund, 
1961b, 1961d, 1961e, 19615; Wohlwill & 
Lowe, 1962)—that adequate training of 
most conservations is difficult and perhaps 
impossible. 

Flavell, as well as Kohlberg (1968), also 
submits that Piagetian analyses of the experi- 
ences which promote conservation acquisition 
are typically unsympathetic to Jearning- 
theory-based explanations with their empha- 
sis on specific experience. Instead, Piaget 
prefers to explain the acquisition of conserva- 
tion concepts in terms of large quantities of 
generalized experience which promote ad- 
vances in certain prerequisite cognitive opera- 
tions (eg, the “reversibility” of thought 
transformations and the independence of 
cognition from perceptual influences). Hence, 
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the second experimental question to be an- 
swered in the process of conservation training 
concerns: (b) the relative efficacy of proce- 
dures that expose subjects to relevant specific 
experiences (counting appears relevant to 
number conservation, and weighing appears 
relevant to weight conservation) versus pro- 
cedures that expose subjects to situations de- 
signed to facilitate relevant cognitive trans- 
formations (e.g. reversibility). 

Assuming one can isolate independent vari- 
ables which clearly facilitate children's con- 
servation of quantitative invariants, four other 
questions become of interest from the per- 
spective of Piagetian theory: (c) Does con- 
servation training generalize to new stimuli 
designed to assess the same conservation 
(specific transfer)? (d) Does the training of 
one conservation facilitate the acquisition of 
dissimilar conservations (nonspecific trans- 
fer)? (e) Are some conservations more resist- 
ant to extinction than others (independent of 
whether they are acquired naturally or experi- 
mentally)? (f) Does a trained conservation 
concept extinguish more readily than its 
naturally acquired counterpart? 


EXPERIMENTAL INDUCTIONS OF 
CONSERVATION 


Before turning to the results of the con- 
servation training experiments, it is necessary 
to consider a point of theoretical clarification 
and some points of methodological clarifica- 
tion. The theoretical point concerns Inhelder 
and Piaget’s (1958) distinction between 
first-order and second-order thought and the 
characteristic varieties of conservation that 
are said to be possible at each level. The 
methodological points are concerned with the 
issue of what criteria should be used to infer 
that a subject is conserving a given invariant 
property. 


First-Order and Second-Order Conservations 


As part and parcel of his distinction be- 
tween first-order and second-order thought, 
Piaget (e.g. 1949, 1953) has argued that 
there are fwo forms of operational reversibil- 
ity (which is why operational reversibility has 
been referred to previously as “reversibility”). 
Specifically, the notion of operational reversi- 
bility refers to the permanent possibility of 
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a cognitive operation returning to its point 
of departure. In fact, operational reversibility 
is a defining and necessary feature of Piaget's 
basic unit of thought, the "operation." While 
both the “preoperations” of early childhood 
and the "operations" of later childhood are 
described by Piaget as internalized action 
Schemas, only the latter also are said to be 
reversible. 

Piaget's first form of operational reversibil- 
ity pertains to objects (reversal via inversion- 
negation), and his second form of reversibility 
pertains to relations among objects (reversal 
via reciprocity). Since inversion-negation can 
be applied to single objects of thoüght one at 
a time, it is therefore a singular transtorma- 
tion. On the other hand, reciprocity obvi- 
ously is a binary transformation, since one 
must consider objects of thought two at a 
time, three at a time, etc., before one can 
properly speak of relations among such 
objects. 

Tnversion-negation, then, consists in simply 
cancelling or negating a particular mental 
operation and can be illustrated by the em- 
pirical analogy of coin tossing. The physical 
operation (tossing) that results in a coin that 
previously showed heads coming up tails can 
be negated (or inverted) by simply turning 
the coin over. Reciprocity involves compen- 
sating changes in one operation with equal 
and opposite changes in a related operation 
(rather than the same operation as was the 
case for inversion-negation). To give an em- 
pirical analogy of reciprocity, when one ob- 
serves a 7-inch stick, A, and a 5-inch stick, 
B, it is legitimate to affirm, “A is longer 
than B." The assertion that “A is shorter 
than B" is the reciprocal of the preceding 
affirmation. Tt should be noted that the pre- 
ceding illustrations are only empirical anal- 
ogies of what are mental transformations for 
the Geneva school. 

Piaget argues that the two forms of reversi- 
bility are distinct at the level of first-order 
thought (concrete operations), but thoroughly 
integrated at the level of second-order thought 
(formal operations). The initial acquisition 
of the two reversibilities is said to be a 
necessary (but not sufficient) precondition 
for the structuring of first-order thought (and 
the subsequent conservation of first-order 
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quantitative invariants). The eventual co- 
ordination of the two reversibilities is said to 
promote the later structuring of second-order 
thought (and the subsequent conservation of 
second-order quantitative invariants). It has 
been pointed out elsewhere (Brainerd, 1970) 
that this dichotomy seems significant, be- 
cause it divides the set of all possible con- 
Servations into two distinct subsets (first- 
Order conservations and second-order con- 
servations) and presumably limits what one 
can say about the acquisition of one subset 
based on experiments with the other subset. 

Piaget considers the first-order conserva- 
tions (number, length, height, substance, 
weight, area) as indexes of concrete operations 
and the second-order conservations (volume, 
density, momentum, rectilinear motion) as 
indexes of formal operations. The first-order 
and second-order conservations differ in that 
the latter necessitate the simultaneous and co- 
ordinated application of the two reversibilities 
to observed data, while the former require 
only the successive application of the two 
reversibilities, Thus, it is not necessarily valid 
to infer that a training procedure which is 
effective with one or more first-order conserva- 
tions also should be effective with second- 
order conservations. 

This caution against inappropriate generali- 
zation becomes particularly relevant when one 
considers that none of the experiments re- 
ported to date has attempted to induce any 
of the second-order conservations, Since ex- 
perimental inductions of conservation have 
focused exclusively on first-order invariants, 
any and all conclusions based on the present 
review must be restricted to this group of 
conservations. 

Of these quantitative invariants previously 
mentioned as first-order, only four have been 
studied extensively, namely, number, length, 
substance, and weight.* 
mental attempts to train 
the implication of the 


Concerning experi- 
these conservations, 
Preceding discussion 


-a-kind study, and since 
the concept of “gross size” seems hopelessly con- 
founded with other spatia] concepts such as length 


and height, this experiment is i 
i i not given sej ar, 
consideration, T 


į 
, 


EXPERIMENTAL INDUCTION OF CONSERVATION 131 


is that those training procedures which expose 
subjects to some empirical analogy of 
one (or both) of the prerequisite reversi- 
bilities should—according to Piagetian theory 
— stand the best chance of inducing these 


conservations. 


Conservation. Criteria. 


The typical first-order conservation assess- 
ment begins with two identical stimulus ar- 
rays. The perceptual equivalence of the initial 
arrays is obvious, and the equivalence of the 
two arrays with respect to some invariant 
property (e.g. number, length, weight) also 
is established to the subject’s satisfaction. One 
of the stimulus arrays then is altered per- 
ceptually. The subject then is asked whether 
the stimulus arrays are still the “same” or 
are now “different” with respect to the in- 
variant property. The subject also may be 
asked to explain or justify his “same” and 
“different” responses. 

There are, then, two main sources of be- 
havioral evidence for the conservation of 
quantitative invariants, namely, subjects’ 
“game-different” responses following deforma- 
tion of one array—Elkind (1967) has called 
this array the “variable stimulus"—and sub- 
jects’ subsequent justifications of their re- 
sponses. The proper use of these sources of 
data has been the subject of methodological 
articles by Gruen (1966), Griffiths, Shantz, 
and Sigel (1967), and Rothenberg (Rothen- 
berg, 1969; Rothenberg & Courtney, 1969). 

By reanalyzing the data of a previous ex- 
periment (Gruen, 1965), Gruen (1966) was 
able to demonstrate that more (and younger) 
subjects are judged to be “conservers” (of 
number) if the subjects are not required to 
explain their “same-different” responses than 
if they are required to explain these responses. 
Gruen’s reanalysis suggests that, other things 
being equal, conservation criteria which re- 
quire that subjects explain their responses 
are “tougher” than criteria which do not 
involve such explanations. 

Griffiths et al. (1967) pointed out that the 
relational alternatives posed in conservation 
questions (“same,” “different”) are of un- 
equal difficulty. Griffiths et al. infer this con- 
clusion on the basis of data indicating that 
subjects typically find the notion “same” 


harder to comprehend than the notion 
“different.” In short, those conservation cri- 
teria employing the notion “same” and/or 
requiring a “same” response should be more 
stringent than those criteria employing only 
the notion “different” and/or requiring a 
* different" response. 

Finally, Rothenberg (Rothenberg, 1969; 
Rothenberg & Courtney, 1969) has suggested 
that conservation criteria, which require only 
that subjects select a “same” or “different” 
response from among the alternatives pre- 
sented following deformation of one stimulus 
array, usher in the possibility of “false 
positives.” Overall, subjects tend to agree with 
what an experimenter says more frequently 
than they disagree. Hence, it is likely that 
when the correct conservation answer involves 
agreeing with something the experimenter has 
said, there is a high incidence of false posi- 
tives. Rothenberg has attempted to solve this 
problem by splitting the usual conservation 
question into two parts. The correct answer 
to one part is “yes” (ie. an agreement), 
while the correct answer to the other part 
is “no” (ie, a disagreement). A subject 
must answer both parts correctly to be judged 
à conserver. 

Each of the preceding methodological points 
is certainly more important to the normative- 
survey conservation literature than to the 
training literature which is the subject of 
the present review. Indeed, these methodo- 
logical issues are critical for investigators who 
wish to establish valid age norms for conserva- 
tion of the various quantitative invariants. 
However, it seems justifiable to note this 
controversy Over conservation criteria in a 
review of the experimental literature as a 
means of dispelling any incipient belief that 
the “unsuccessful” training studies that we 
consider tended to apply “tougher” conserva- 
tion criteria than did the "successful" studies. 
Instead, the relation between success and 
stringency of criteria is about zero. Some of 
the most successful experiments (e.g., Gruen, 
1965; Kingsley & Hall, 1967; Smith, 1968) 
employed very strict conservation criteria (in 
terms of the preceding three points), while 
some of the unsuccessful experiments em- 
ployed much simpler criteria (e.g., Mermel- 
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stein & Meyer, 1969; 
Wohlwill & Lowe, 1962). a 

Since the preceding methodological distinc- 
tions provide no single or simple way to 
account for the fact that some conservation- 
training experiments have been successful and 
others have not, the only reasonable criterion 
of “success” for the experiments reviewed 
must be statistical significance. Hence, insofar 
as an investigator reports induction effects 
at or above normally accepted levels of 
confidence, we label the relevant study 
“successful.” 


Smedslund, 1963; 


Inductions of Number Conservation 


The initial first-order quantitative invariant 
to be conserved in the Piagetian sequence of 
cognitive development is number, or cardinal- 
ity (Piaget, 1952b). The child is said to ac- 
quire conservation of number in conjunction 
with the ability to seriate relations among 
objects that also involve the concept of 
numerosity. The number invariant is unique 
in that it is the only first-order invariant that 
is discontinuous by definition. The remaining 
first-order invariants considered in the present 
review may be treated of as either continuous 
or discontinuous—for instance, a length of 
string can be considered as a unit, or it can 
be divided into component segments. 

A typical number conservation problem 
consists of the following: Two rows of objects 
containing equal numbers of elements are 
placed before the child. The child is asked 
to count the objects in each row and to tell 
the experimenter whether or not the two rows 
are of equal cardinality (the 5—7-year-old 
child usually agrees that they are). Next, the 
experimenter increases (decreases) the dis- 
tance between the objects in one row such 
that the transformed row is longer (shorter) 
than its mate. The child again is asked 
whether or not the rows contain the same 
number of elements, An affirmative answer 
accompanied by a satisfactory explanation 
indicates that the child is conserving numeros- 
ity, while a negative answer and/or an un- 
satisfactory explanation suggest that number 
conservation is absent, 


Of the four first-order invariants considered 
in the present review, number conservation 
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has received the most experimental attention. 
From the earliest to the most recent, the 
relevant studies are: Wohlwill and Lowe 
(1962); Wallach and Sprott (1964); Beilin 
(1965); Gruen (1965); Wallach, Wall, and 
Anderson (1967); Winer (1968); Gelman 
(1969); Mermelstein and Meyer (1969); 
Rothenberg and Orost (1969). Seven of the 
studies were successful. In addition, there is 
an interesting thread of similarity running 
through the apparently disparate methodolo- 
gies of the successful experiments, The earliest 
experiment noted above (Wohlwill & Lowe 
1962) as well as a more recent one (Mermel. 
stein & Meyer, 1969), failed to induce number 
conservation. A tenth experiment that exam- 
ined the differential potency of two number- 
training methods (Feigenbaum & Sulkin 
1964) failed to include a control group and 
is not considered here. g 
Wohlwill and Lowe (1962 
distinct training procedures: trial-by-trial re- 
inforced practice, practice adding and sub- 
tracting single elements from rows of objects 
and dissociation of perceptual cues from 
number cues. It seems fair to categorize 
Wohlwill and Lowe's treatment conditions as 
specific rather than general experience, since 
the investigators made no direct attempt to 
promote any of the cognitive transformations 
that Piaget argues are responsible for the 
conservation of quantitative invariants. There- 
fore, the experiment does not appear to be 
a direct test of Piagetian hypotheses about 
the origins of number conservation. 
Mermelstein and Meyer’s (1969) study 
seems to represent even less of a direct test 
of Piagetian theory than does the Wohlwill 
and Lowe study. In fact, Mermelstein and 
Meyer's report exemplifies the substantive 
and procedural confusions that are present to 
some degree in all of the conservation training 
experiments (both successful and unsuccess- 
ful). Their introductory comments evidence 
some misconceptions about 
Piaget’s notions on the origi 
conservation. For exam 
Piagetian theory 
training conservati 


) employed three 


the details of 
ns of number 
ple, they maintain that 
eschews the possibility of 
ons. As we hope our analy- 
sis of the origins of conservation sugg 


the Piagetian position with respect ne 
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servation training is more complex than this.* 
Mermelstein and Meyer’s conceptual difficul- 
ties might be related to the fact that they 
reference only one primary source (Piaget, 
1967) which itself is not concerned primarily 
with number conservation. 

Substantive issues aside, Mermelstein and 
Meyer proceeded to “replicate” four conserva- 
tion training techniques reported by other in- 
vestigators (Beilin, 1965; Bruner, 1964; 
Sigel, Roeper, & Hooper, 1966; Smedslund, 
1961e). Three of the methods (Bruner, 1964; 
Sigel et al., 1966; Smedslund, 1961e) origi- 
nally were developed to induce substance 
conservation—not number conservation. None- 
theless, Mermelstein and Meyer employed 
these three procedures in a number induction 
situation. Moreover, the one method of the 
four that was developed to induce number 
conservation (Beilin, 1965) was not pre- 
cisely replicated, as the investigators specifi- 
cally admit. 

Even if Mermelstein and Meyer's training 
procedures had been precise replications, 
their posttests for the presence of induced 
conservation would have made their data diffi- 
cult to interpret. Having trained number con- 
servation, the investigators’ initial posttests 
were for substance conservation. This post- 
test assessment actually constituted a test for 
nonspecific transfer of training (cf. the section 
of the present review concerned with the 
transfer of induced conservation); the logic 
of such a posttest has been found to be spe- 
cious by Pinard and Laurendeau (1969). 
Further, the posttests for substance conserva- 
tion actually were extinction trials similar to 
Smedslund's (1961c) “countersuggestion” 
method for extinguishing weight conservation. 
Thus, by the time the subjects finally were 
posttested for induced number conservation, 
they already had received two conservation 
extinction trials. 


5A further quotation might help to clarif 
exact position of the Genevan school vi 
hether or not conservations of quantitative invari- 
ants are trainable in principle. In a discussion. of 
the acquisition of the conservations of number 
and quantity, Inhelder and Matalon (1960) have 
stated, wThis process of acquisition which can, of 
urse, be accelerated. by training, corresponds to 
oe gerat progress toward an ‘operational’ quality in 
Bra thought of the child [p. 446].” 


y the 


w 


133 


For these reasons, it is difficult to say ex- 
actly what Mermelstein and Meyer's study 
reveals about the experimental induction of 
first-order invariants such as number and 
substance. Although no evidence of increased 
substance conservation was noted, this is not 
surprising since Mermelstein and Meyer did 
not train substance conservation. Likewise, the 
failure to find increments in number conserva- 
tion is confounded by the fact that subjects 
received two conservation extinction trials be- 
fore being posttested for number. In short, 
the conceptual and procedural difficulties in- 
herent in this experiment preclude the possi- 
bility of testing any simple null hypothesis 
about conservation induction. 

The remaining attempts to induce number 
conservation (Beilin, 1965; Gelman, 1969; 
Gruen, 1965; Rothenberg & Orost, 1969; 
Wallach & Sprott, 1964; Wallach et al., 
1967; Winer, 1968) employed a variety of 
training procedures and were uniformly suc- 
cessful, Some of the successful methods pro- 
vided for the reinforcement of responses by 
feedback (e.g., Gelman, 1969), but most did 
not. Some of the successful methods enriched 
subjects’ perceptual experience with a specific 
verbal rule (e.g, Beilin, 1965), but again 
most did not. In sum, there appears to be a 
surprising degree of heterogeneity among the 
successful number-training procedures. 

There is at least one feature common to all 
the successful number-training procedures, 
however. All exposed subjects to situations 
which specified the object-bound form of 
operational reversibility (reversibility by in- 
version-negation). In some cases the specifica- 
tion of the inverse of a relevant transforma- 
tion was verbal (e.g., Beilin, 1965), while in 
others the specification of reversibility by in- 
version was perceptual (e.g., Gelman, 1969). 
In still other instances (e.g, Wallach et al., 
1967), some mixture of verbal and perceptual 
specification of the inverse of relevant trans- 
formations was employed. 

This apparent similarity among the meth- 
ods of the successful number-training experi- 
ments is particularly interesting in light of 
Piaget's thoughts about the origins of the 
conservation of quantitative invariants. Tt 
will be remembered that Piaget attributes the 
onset of first-order thought (with its subse- 
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quent first-order conservations) to the acqui- 
sition of the two modes of cognitive reversibil- 
ity (inversion-negation and reciprocity). It 
will also be remembered that inversion-nega- 
tion is the object-bound form of operational 
reversibility. Given Piaget's conjectures and 
given the fact that conservation concepts are 
object-concepts, it is indeed interesting that 
Piaget's inversion-negation reversibility is im- 
plicit in each of the successful number-train- 
ing procedures. 

‘Within certain limits, the ages of the sub- 
jects used apparently is not strongly related 
to success in inducing number conservation. 
The successful experiments of Berlin (1965), 
Gruen (1965), Gelman (1969), and Rothen- 
berg and Orost (1969) employed subjects 
that were as young or younger than the sub- 
jects of Wohlwill and Lowe's unsuccessful 
experiment. 


Inductions of Length Conservation 


The conservation of Space in one dimension 
—length—is the next first-order quantitative 
invariant to be conserved in the Piagetian se- 
quence of cognitive development, A test for 
conservation of length consists of placing two 
sticks of equal length side by side such that 
their ends coincide. The child is asked whether 
or not the two sticks are the same length, and 
the 5—7-year-old child usually will respond 
affirmatively. One of the sticks is moved for- 
ward (backward), and the child again is 
asked whether or not the two sticks are the 
same length. A second affirmative reply and 
an appropriate explanation indicate that the 
child is conserving the invariance of length, 

Five studies (Beilin, 1965; Gelman, 1969; 
Gruen, 1965; Kingsley & Hall, 1967; Murray, 
1968) have reported training procedures that 
significantly improved the tendency to con. 


length Conservation, Gelman's (1969) prob. 
ably was the most elaborate, Gelman's stimuli 


hope that they would learn to attend to only 
the relevant cues in the length conservation 
situation. The trials were divided into blocks 
of six, and the Criterion. for completion of 


In light of the previously noted similarity 
among number-training experi- 
ments, it is interesting to note that Gelman's 
length-training procedure also tended to spec- 
ify the inversion-negation form of operational 
reversibility, For example, the diagram that 
Gelman presented to Summarize her procedure 
indicated that whenever a particular percep- 
tual transformation Was inherent in à given 
stimulus triad (e.g., one stick moved forward), 
the inverse of that Same transformation (the 
stick replaced at its starting point) was 
specified by some later triad, Hence, the 
inversion-negation of perceptual transforma- 
tions typically employed in Conservation of 


Using less elaborate means, the other ex- 
periments reporting successful training of 
length conservation (Beilin, 1965; Gruen, 
1965; Kingsley & Hall, 1967; Murray, 1968) 
also specified operational reversibility by in- 
version to some degree. Murray added and 
subtracted Muller-Lyer arrowheads from the 
same line. Beilin (1965) and Gruen (1965) 
specified the inverse of a perceptual trans- 
formation via a verbal rule, while Kingsley 
and Hall (1967) employed a learning set 
technique that also Specified the inverse of 
perceptual transformations. 

In contrast to the preceding studies, 
Smedslund (1963) did not report significant 
increments in length conservation for any of 
five separate length-training procedures,? 
Similar to Murray’s (1968) technique, Smeds- 
lund's (1963) length-induction method in- 
volved the addition and subtraction of Muller- 
Lyer arrowheads from stimulus lines, Unlike 
Murray’s procedure, however, Smedslunq's 
procedure added and subtracted Segments 
from the stimulus lines. This Manipulation of 
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the stimulus lines disrupts the illusion and 
therefore presents a new situation for the child 
to judge. In addition, Smedslund employed 
new stimulus materials on each trial. Not 
only does this constant introduction of new 
materials make it impossible to specify reversi- 
bility by inversion, but Morrisett and Hov- 
land (1959) have reported that analogous 
procedures militate against the formation of 
an appropriate learning set. According to 
Morrisett and Hovland, an optimal procedure 
to employ in concept induction involves a few 
sets of stimulus materials with repeated ex- 
posures to each set. 

For the most part, then, the experiments 
that have attempted to induce length conser- 
vation have been successful. Due to the afore- 
mentioned features of Smedslund's (1963) 
length-training procedure, his failure to induce 
length conservation offers no serious challenge 
to the successful experiments. Finally, as was 
the case for number-induction experiments, 
those training procedures that produced sig- 
nificant increments in length conservation also 
were those procedures that specified Piaget's 
object-bound form of reversibility. 

As was the case for inductions of number 
conservation, the ages of the subjects used 
apparently is not strongly related to success 
in inducing length conservation. Smedslund’s 
(1963) unsuccessful experiment employed sub- 
jects that were older than the subjects used in 
any of the successful experiments, save Mur- 
ray’s (1968). 


Inductions of Substance and Weight 
Conservations 


The final first-order quantitative invariants 
that have been subjected to experimental 
investigation are substance (also called mass 
or quantity) and weight. The conservation of 
substance refers to the knowledge that an ob- 
ject’s mass or amount of matter remains con- 
stant throughout changes in shape. Conserva- 
tion of substance is sometimes confused with 
volume conservation (a second-order invari- 
ant). For example, this confusion is present 
in a paper by Trabasso (1968), who failed 
to note that while liquids may be used to 
ss both substance and volume conserva- 
respective questions which subjects 
(“Do the beakers 


assi 
tions, the 


must ponder are different 
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contain the same amount of water?” versus 
“Does the water take up the same space or 
room in both beakers?”). d 

Weight conservation involves essentially the 
same assessment procedure as substance con- 
servation. To conserve weight, however, sub- 
jects must conclude that the weight of an 
object, as well as its gross amount of matter, 
remains constant throughout changes in shape. 
Hence, weight conservation may be thought of 
as substance conservation with an added 
weight judgment. 

The conservations of substance and weight 
typically are assessed with two clay balls of 
equal size and weight (although two beakers 
of liquid may be used also). Once the subject 
has agreed that the two balls contain equal 
amounts of matter or are of equal weight, 
the experimenter proceeds to alter the shape ` 
of one of the clay balls. The subject again is 
asked whether or not the two pieces of clay 
contain the same amount of matter or are of 
equal weight. Affirmative answers and ap- 
propriate explanations are the criteria for the 
conservations of substance and weight. 

Experimental attempts to induce substance 
conservation have been reported by Smeds- 
lund (1961e, 1966), Brison (1966), Fleisch- 
mann, Gilmore, and Ginsburg (1966), Frank 
(reported by Bruner, 1966) and Sonstroem 
(1966). The latter two experiments investi- 
gated the differential efficacy of several induc- 
tion methods (e.g., inversion exposure, reci- 
procity exposure, perceptual screening). As 
was the case for Feigenbaum and Sulkin's 
(1964) number-training study, however, 
Frank and Sonstroem both neglected to in- 
clude appropriate control groups. In view of 
other data indicating a consistent improve- 
ment in nonconserving controls from pretest to 
posttest (probably as a function of familiar- 
ity), the experiments of Frank and Son- 
stroem are not considered further. 

Smedslund (1961e, 1966) and  Brison 
(1966) successfully induced substance conser- 
vation. Brison employed two identical glasses 
filled with equal amounts of water. The train- 
ing trials consisted of pouring the water from 
one of the glasses into a dissimilar container 
and then back into the original container. Fol- 
lowing each pouring of water into the dissimi- 
lar container, subjects were asked whether or 
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not the amount of water remained the same 
and were told whether or not their responses 
were correct. Then, the water was returned to 
the original container, and the substance con- 
servation question was repeated. 

Given the earlier definition of reversibility 
via inversion, it can be seen that Brison's 
training procedure specified the inverse of a 
transformation that is relevant to substance 
conservation. Likewise, one of Smedslund’s 
successful substance-training techniques 
(1966) clearly specified inversion by employ- 
ing stimulus triads similar to those Gelman 
(1969) used to train number and length con- 
servations. Although Smedslund’s other suc- 
cessful substance-training procedure did not 
appear to specify the object-bound form of 
Piagetian reversibility, the efficacy of this 
procedure has been called into question by 
reported failures to replicate (Smith, 1968: 
Wallach et al., 1967). 

A final substance-training experiment was 
reported by Fleischmann et al. (1966). The 
authors concluded that their data indicated 
"little effect of any of the conditions,” Al- 
though Fleischmann et als experiment is 
probably more significant when considered as 
a normative study, it is interesting to note 
that the investigators consciously avoided 
Specifying any observable analogy of Piaget’s 
object-bound form of reversibility, 

Experimental inductions of weight conserva- 
tion have been reported by Kingsley and 
Hall (1967) and Smith (1968). Smith's suc- 
cessful weight-training method was a modifica- 
tion of Beilin’s (1965) procedure for training 
number conservation. Smith presented his 
Subjects with pairs of identical clay objects, 
one of which subsequently was deformed, The 
experimenter asked the subjects whether or 
not the two objects continued to weigh the 
same. If the answer was incorrect, the princi- 
ple of conservation was stated, and the in- 


verse of the previous deformation was dem- 
onstrated: 


Tf we start with an 


object like this one [pointing to 
the undeformed obje 


ct] and we don't put any pieces 
of plasticine on it or take any pieces away from it, 
then it still weighs the same even though it looks 
different, See, T can make it back into a. . . $0 it 
hasn't. really changed [Smith, 1968, p. 520]. 


Although Smith, like most of the other inves- 
tigators, fails to mention the specification of 
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reversibility by inversion, it is evident from 
the preceding quotation that such reversibil- 
ity was indeed demonstrated, 

Kingsley and Hall (1967) trained weight 
conservation with a complex learning-set pro- 
cedure that was adapted to the cognitive 
sophistication of individual subjects. Analysis 
of Kingsley and Hall's procedure indicates 
that the inverse of relevant transformations 
was specified by the training trials. 

In contrast to the successful experiments 
of Kingsley and Hall (1967) and Smith 
(1968), Smedslund has reported two failures 
to train weight conservation (Smedslund, 
1961b, 1961d). Smedslund's unsuccessful pro- 
cedures differ from the techniques of the suc- 
cessful experiments in that neither of his 
methods tended to specify a relevant form of 
reversibility, In addition, as was the case for 
Smedslund’s (1963) length-training experi- 
ment, both of the unsuccessful weight-train- 
ing procedures are poorly conceived from the 


standpoint of learning set (Morrisett & Hov- 
land, 1959), 


To summarize the results of experimental 
attempts to train the conservations of sub- 
Stance and weight, three studies (Brison, 
1966; Smedslund, 1961e, 1966) have reported 
successful inductions of substance conserva- 
tion, and two studies have reported Success- 
ful inductions of weight conservation (Kings- 
ley & Hall, 1967; Smith, 1968). Failures to 
induce weight conservation have been re- 
ported by Smedslund (1961b, 1961d) and by 
Smith (1968) using a previously successful 
substance-induction method — (Smedslund, 
1961e). Similar to the successful inductions of 
number and length, those experiments pro- 
ducing significant increments in substance and 
Weight conservations made use of treatments 
that specified operational reversibility by in- 
version. 

For inductions of substance and weight con- 
servations, as for inductions of number and 
length conservations, the age of the subjects 
used does not differentiate the successful and 
unsuccessful experiments, 


Transfer of Induced Conservation 


To reiterate some of the introductory com- 
ments, at least two questions about the trans- 
fer of induced Conservation appear to be of 
immediate interest, First, assuming one can 
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train subjects to conserve a particular in- 
variant property, will subjects continue to 
conserve that same property when coníronted 
with stimuli other than those on which they 
were trained (specific transfer)? Second, 
again assuming one can train subjects to con- 
serve a particular property, will subjects also 
be more likely to conserve nontrained—and 
dissimilar—properties (nonspecific transfer)? 

Evidence for the specific transfer of in- 
duced number conservation to novel materials 
has been reported by Wallach and Sprott 
(1964), Beilin (1965), and Gelman (1969). 
Moreover, no one has reported a failure to 
find specific transfer of induced number con- 
servation. Beilin (1965) and Gelman (1969) 
also have reported specific transfer of induced 
length conservation. Further evidence in sup- 
port of Beilin’s and Gelman’s data has been 
reported by Murray (1968). Again, there is 
no report of a failure to find specific transfer 
of induced length conservation. — Brison 
(1966) found specific transfer of induced 
substance, but neither of the studies reporting 
success in the training of weight conservation 
(Kingsley & Hall, 1967; Smith, 1968) exam- 
ined specific transfer phenomena. In short, 
those studies that have looked for specific 
transfer of induced first-order conservations 
have, without exception, found it. 

Conversely, there have been several failures 
to find evidence suggesting the occurrence of 
nonspecific transfer of induced conservations 
to dissimilar conservation concepts. Mermel- 
stein and Meyer (1969), for example, used 
the various training procedures advanced by 
Smedslund (1961e), Bruner (1964), Beilin 
(1965), and Sigel et al. (1966) and found 
that substance conservation was not facili- 
tated by number-induction methods. Also for 
the case of induced number conservation, 
Beilin (1965) found no transfer to area con- 
servation, Wallach et al. (1967) found no 
transfer to substance conservation, Gruen 
(1965) found no transfer to either substance 
or length conservations, and Winer (1968) 
found no transfer to either substance or 
height. Kingsley and Hall (1967) and Mur- 
ray (1968) reported no transfer of induced 
length conservation to substance and area con- 
servations, respectively. 

So far as induced substance and weight 
conservations are concerned, there have been 
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no satisfactory studies of nonspecific transfer. 
Even though Sullivan (1967) did find some 
nonspecific transfer of induced substance con- 
servation, Kohlberg (1968) suggested that 
such results were probably obtained because 
Sullivan's “conservation” criteria were more 
perceptual than conceptual. Finally, Kings- 
ley and Hall (1967) reported some “nonspe- 
cific” transfer of induced weight conservation 
to substance conservation. Kingsley and Hall 
implied, however, that such transfer is not 
actually nonspecific, since weight and sub- 
stance conservations are almost identical (sub- 
stance conservation is simply weight conser- 
vation minus the terminal weight judgment). 

Although the preceding results concerning 
nonspecific transfer are largely nonsupportive, 
the picture is clouded somewhat by the data 
of Gelman’s recent study. Gelman (1969) 
found that both induced number and induced 
length conservations transferred to two tests 
of substance conservation—not “volume” 
conservation as Trabasso (1968) has mistak- 
enly reported. Unfortunately, Gelman’s ex- 
periment contains two features that make it 
difficult to integrate with the other transfer 
studies. First, Gelman subjected her experi- 
mental sample to both number and length 
training, whereas other studies have em- 
ployed multiple experimental groups with 
individual subjects receiving training on only 
one conservation concept. On this basis it is 
difficult to decide whether Gelman's results 
are attributable to number induction per se, 
length induction per se, or some interaction of 
the two (the latter hypothesis seems the more 
reasonable). Further, the conserving responses 
given by subjects who "transferred" suggest 
that like Sullivan (1967), Gelman's nonspe- 
cific transfer responses were more perceptual 
than conceptual. 

In general, the results of these transfer 
studies are more supportive of Piagetian pre- 
dictions than not. The evidence for specific 
transfer of induced conservations to new ma- 
terial appears clear and consistent. Also, it is 


7 The comments of Kingsley and Hall also are 
applicable to a recent experiment by Rothenberg and 
Orost (1969). Rothenberg and Orost found “non- 
specific" transfer of induced number conservation 
to conservation of discontinuous quantity. Like 
weight and substance conservations, the conservation 
of number and discontinuous quantity are more 
similar than they are different. 
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evident that more data on nonspecific trans- 
fer are needed before any sweeping conclu- 
sions can be made about this phenomenon. 


Extinction of Induced Conservation 


As was the case for conservation transfer, 
two central questions may be asked concern- 
ing conservation extinction. If one considers 
conserving subjects in general (regardless of 
whether they acquired their conservation 
abilities “naturally” or were trained to con- 
serve), is there any difference in the absolute 
resistance to extinction of dissimilar conserva- 
tions? For each quantitative invariant, is the 
relative resistance to extinction different for 
induced conservers as opposed to “natural” 
conservers? Before considering the data, a 
few brief comments should be made about the 
characteristic extinction procedures employed 
in the relevant studies. 

All conservation training experiments begin 
with the division of their respective subject 
populations into conservers and nonconserv- 
ers (on the basis of a conservation pretest), 
Of course, it is the nonconservers who then 
Serve as experimental and control subjects for 
the training part of the study. After the train- 
ing sessions, certain of the studies also have 
examined the absolute resistance to extinction 
of their particular conservations and the rela- 
tive resistance to extinction of trained sub- 
jects versus those identified on the pretests as 
natural conservers, All of these extinction pro- 
cedures have been modeled after Smedslund's 
(1961c) “countersuggestion” method for elim- 
inating weight conservation. Briefly, the 
Countersuggestion procedure consists in the 
violation of subjects’ expectations of conser- 
vation. To accomplish this, the experimenter 
allows subjects to make correct conservation 
predictions, but before subjects can verify 
their Predictions, the experimenter covertly 
alters one of the stimuli such that the in. 
variant property actually is not conserved. 

Brison's (1966) countersuggestion proce- 
dure employed a glass with a false bottom to 
violate predicted Substance conservation and 
the technique is a good example of the meth- 
ods typically employed to extinguish first- 
order conservations, Brison presented subjects 
with glasses that appeared identical and then 
poured quantities of liquid that subjects knew 
were identical into the two glasses. The false 


bottom in one of the glasses always made 
one of the quantities subsequently appear 
greater, thereby violating Subjects! predic- 
tions. By instituting some simple variations, 
this general technique has been used to extin- 
guish the tendency to conserve other first- 
order invariants. 

Of the various number-training experi- 
ments, only Wallach and Sprott (1964) ex- 
amined related extinction phenomena, They 
reported that the absolute extinction resist- 
ance of number conservation is high for both 
natural and induced conservers. All of the 
reported attempts to induce length conserva- 
tion have failed to examine the extinction 
thereof. However, the extinction by counter- 
Suggestion of both substance and weight 
conservations has been studied. 

Briston (1966) has assessed the extinction 
of substance conservation and found the con- 
cept to be highly resistant to countersugges- 
tion in both natural and trained conservers, 
Further, there was no tendency for natural 
and trained conservers to extinguish at dif- 
ferent rates, 


Conservation of weight extinction was ex- 
amined in the experiments of Smedslund 
(1961c), Kingsley and Hall (1967), and 
Smith (1968). Kingsley and Hall (1967) and 
Smith (1968) reported identical findings with 
Tespect to the extinction of weight conserva- 
tion, namely, the Concept was almost com- 
pletely nonresistant to countersuggestion, and 
there apparently was no tendency for natural 
and induced Conservers to extinguish at differ- 
ent rates. Thus, these two studies did not 
find natural weight conservers to be any more 
resistant to countersuggestion than induced 
conservers, A . 

The original countersuggestion experiment 
Teported by Smedslund (1961c) is the sole 
exception to the conclusion that natural and 
induced conservers do not appear to extin- 
guish at different rates, Smedslund's. experi- 
ment investigated weight conservation, and 
although the data indicated that the concept 
of weight was not particularly resistant to 
extinction (fully one-half of the natural con- 
Servers extinguished), Smedslund nonetheless 
reported that induced Conservers extinguished 
more readily than natura] conservers. Prima 
facie, the data of Smedlund's brief report 
seems clear and simple, Unfortunately, a cer- 
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tain feature of Smedslund's design would tend 
to produce spuriously significant results in the 
observed direction. 

While Smedslund's experiment is frequently 
cited as authoritative proof that induced con- 
servers extinguish more readily than natural 
conservers (e.g., Anderson, 1965), insufficient 
attention has been given to the fact that 
Smedslund’s “induced” conservers were drawn 
from the experimental group of a previous 
study (Smedslund, 1961b) in which the 
training procedure failed to induce weight 
conservation. In other words, Smedslund's 
"trained" weight conservers (1961c) came 
from the same group of subjects that pre- 
viously had been inferred (1961b) to be not 
trained on the basis of a failure to reject the 
null hypothesis. Hence, it is not surprising 
that Smedslund's “trained” subjects extin- 
guished more thoroughly than their naturally 
conserving counterparts—statistically speak- 
ing, they were never trained to begin with. 
Moreover, Smith (1968) replicated Smeds- 
lund's procedure and failed to find different 
extinction rates for natural and trained weight 
conservers. 

Smedslund (1961c) also raised a substan- 
tive issue about conservation extinction that 
should be weighed briefly. As indicated ear- 
lier, Smedslund argued that Piagetian theory 
predicts more rapid extinction for trained con- 
servers than for natural conservers. However, 
Smedslund’s interpretation seems suspect for 
the following reasons. Piaget has studied con- 
servation phenomenon precisely because he 
believes there is more to intellectual develop- 
ment than cumulative experience and incre- 
mental changes in behavior. For Piaget, there 
are qualitative as well as quantitative changes 
and conservation of a first-order quantitative 
invariant falls in the former category. If one 
holds to this view of conservation as qualita- 
tive change and hypothesizes in light of it, 
it seems reasonable to predict that natural 
and trained conservers should be equally re- 
sistant to extinction. Regardless of how long 
it took to get there or what happened along 
the way, both adequately trained and natu- 
rally acquired invariants presumably involve 
the same heuristic leap from nonconservation 
to conservation. 

To sum up the extinction data, there is an 
indication that weight conservation is not very 
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resistant to extinction as compared with num- 
ber and substance conservations. This tenta- 
tive result seems most reasonable when one 
considers that, ontogenetically, weight is one 
of the last first-order invariants to be con- 
served (Inhelder & Piaget, 1958). This result 
also is intuitively reasonable in that one of 
the more resistant conservations (substance) 
is a logical precondition for weight conserva- 
tion. Intuitively, one would expect less extinc- 
tion for the more primitive concept. 

With the exception of Smedslund's (1961c) 
experiment, none of the remaining reports 
found any difference in the rate of extinction 
for natural and induced conservers. In view 
of the theoretical significance of this result, 
it seems wise to suspend judgment on this 
point until independent verifications of the 
relevant studies have been reported. In addi- 
tion, the relative extinction rate hypothesis 
clearly merits testing for the case of the 
remaining first-order conservations. 


CONCLUDING COMMENTS 


To answer the initial question posed in the 
introduction, it seems reasonable to conclude 
that conservation of at least four of the first- 
order quantitative invariants can be acceler- 
ated by appropriate training procedures. In 
part, this conclusion is also affirmed by 
Flavell and Hill (1969), thereby repudiating 
Flavell's earlier (1963) position. In their 
annual review of developmental psychology, 
Flavell and Hill (1969) concluded: 


The early Piagetian training studies had negative 
outcomes, but the picture is now changing. If our 
reading of recent trends is correct, few on either side 
of the Atlantic would now maintain that one 
cannot by any pedagogic means measurably spur, 
solidify, or otherwise further the child's concrete- 
operational progress [p. 19]. 


The second question we asked concerned 
the types of methods that have been most 
successful in the training of first-order con- 
servations. In an early study, Wohlwill and 
Lowe (1962) grouped training methods into 
three categories: reinforcement, addition-sub- 
traction, and conflict. Although this classifica- 
tion has remained implicit in much of the 
subsequent conservation training research, the 
present review suggests its inadequacy with 
respect to two points. The classificatory 
scheme of Wohlwill and Lowe fails to dis- 
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criminate those successful and unsuccessful 
training methods reviewed in the present 

- paper. Further, their classification does not 
designate which cognitive skills (if any) are 
facilitated by the three categories of training 
methods. 

It was noted above that each of the suc- 
cessful training studies tended to specify the 
inversion-negation form of operational reversi- 
bility (a general experience as opposed to a 
specific one). Therefore, one possible method 
for classifying the training procedures (based 
on currently available data) is to divide them 
into those procedures that specify operational 
reversibility and those which do not. More- 
over, we contend that the data of the success- 
ful experiments strongly suggest that one 
critical factor promoting the induction of 
first-order conservations has been the object- 


(à) A=B 
(b) A—A'orB—B' 
(c) A «— A' or B .— B/ 


Each triad of trials (a, b, c, and d, e, f) is 
concerned with the same objects. Although 
this is a general model, it easily can be 
adapted to any class of objects commonly 
used to assess a given first-order conservation. 
For example, objects such as plastic eggs and 
eggcups might be used to train number con- 
servation. On the first trial, a, the equivalence 
of eggs and eggcups would be established. 
The second trial, b, might consist of moving 
the eggs closer together (—>) and questioning 
subjects about the equivalence of the two sets 
of objects following the transformation. (Sub- 
jects might also be allowed to verify their 
answers.) On the third trial, c, the eggs would 
be returned (<) to their original positions, 
The second triad of trials, d, e, f, would in- 
volve moving the eggs further apart and 
then inverting the transformation. Inversion- 
negation reversibility might be specified for 
the eggcups too, and other triads could be 
devoted to suitable repetitions. 

For example, the successful length-training 
procedure used by Gelman (1969), and the 
unsuccessful length-training procedure used 
by Smedslund (1963) can be compared via 
the preceding model. In Gelman’s procedure, 
a series of three colored sticks were elements 


CHARLES J. BRAINERD AND TERRY W. ALLEN 


bound form of reversibility. Obviously, this 
inference is congruent with Piaget’s formula- 
tions, summarized earlier in the present paper, 
about the cognitive preconditions for acquisi- 
tion of first-order conservations. 

From the preceding conclusions, a model 
might be formulated for constructing (or re- 
constructing) conservation induction proce- 
dures. Such a model is presented below, and 
it at least provides a partial answer to the 
second question posed in the introduction. 
The unprimed letters (A, B, C, D) denote 
objects or classes of subjects, while the primed 
letters (A', B', C', D') denote the same ob- 
jects after some relevant perceptual trans- 
formation has been performed on them. Rele- 
vant transformations are denoted by — and 
their inverse by —, 


(d) C=D 
() C—C'orD—Dp' 
(f) CC or D — D', etc. 


A, B, and C of thi receding training model 
(with C irrelevant . Tor our purposes). The 
following is a schematic summary of the 
relevant portions of the procedure: 


(a) ——— 
——— (A = B #0) 
ne ee 
(0) ————— 
——- (B > B’) 
(c) 
—— (B<—B’) 


The first step in the procedure established 
the equivalence of two of the sticks and the 
nonequivalence of a third stick for the child 
(A = B z^ C). In Step b, Line B is trans- 
formed to Line B' by moving it forward 
(B — B'). The transformation subsequently 
is inverted by returning the transformed line 
to its original place (B < B^), Thus, a per- 


1 


ceptual analogy of operational inversion was d 


specified by the procedure in the form of a 
negation (Step C) of a previous perceptual 
deformation (Step b). 

On the other hand, Smedslund's (1963) un- 
successful training procedure used colored 


d 


d 


Y 
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sticks in conjunction with Muller-Lyer arrow- 
heads (Elements A and B of the preceding 
model). In an addition /subtraction procedure, 
the sticks were shown to be equal (A = B) 
and then moved out to the arrowheads 
(A> A’; B— B'), thereby producing the 
common Muller-Lyer illusion of unequal 
length. Next, a segment was removed from 
the stick that had been placed on the outward 
pointing arrowheads (A'— A"). The child 
then was asked if the sticks were equal in 
length or if they were unequal. The segment 
then was returned (A' — A"). This manipula- 
tion was followed by the preceding conserva- 
tion question. In the next step, a segment 
was added to the stick placed on the inward 
pointing arrowheads (B' — B"), and this was 
followed by the standard conservation ques- 
tion, The added segment subsequently was 
subtracted from the line (B' — B"). The ap- 
parent difference between Smedslund's and 
Gelman's methods is that Smedslund did not 
negate the initial transformation, that is, the 
manipulation that produced the illusion in 
the first place. 

Smith's (1968) successful procedure and 
Smedslund's (1961d) procedure is used to 
apply the preceding training model to the 
induction of weight conservation. Smith em- 
ployed two pieces of clay as stimulus objects 
(Elements A and B of the model). The two 
pieces of clay were presented to the child, 
and the child established the equivalence of 
the objects with respect to their weight 
(A= B). Next, one of the pieces of clay 
(B) was deformed by rolling it into a sausage 
(B > B^). The child then was asked whether 
or not the two pieces of clay were equal in 
weight. If an incorrect answer was given, the 
experimenter stated the principle of conserva- 
tion and negated the previous deformation by 
rolling the clay back into its original form 
(B —B’). The above procedure was repeated 
over several sessions. 

Comparison of Smedslund's (1961d) un- 
successful weight-training procedure with 
Smith's method reveals a criticial difference 
between superficially similar techniques. 
Smedslund also used pieces of clay of dif- 
ferent colors and forms (Elements A and B). 
On half of the training trials, the stimulus 
objects were unequal at the beginning of the 


trial. Thus, A was heavier than B on half 
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the trials, although the two objects were of 
equal volume. Before each trial, the inequality 
(or equality) of the two objects was estab- 
lished (A B or A = B). One piece of clay 
(B) was deformed by flattening it into a 
pancake (B— B^. The child then was 
asked a standard conservation question. In- 
stead of immediately negating the previous 
deformation (flattening) as Smith (1968) 
did, Smedslund subjected the object to 
further deformations (B'— B”; B'— B"; 
B” — B"), A total of four one-way trans- 
formations had been períormed by the time 
the trial finally was over. At the onset of 
each subsequent set of trials, new stimulus 
materials were introduced. In sum, a critical 
difference between the procedures of Smith 
and Smedslund appears to be that the former 
negated relevant transformations, while the 
latter did not. 

In addition to the model for conservation 
induction, the authors feel that other meth- 
odological factors must be considered in the 
construction of optimal training procedures. 
For example, an optimal induction probably 
should employ a few sets of stimuli with a 
moderate number of exposures to each. Stimu- 
lus objects and transformations of the objects 
might be equated along appropriate dimen- 
sions, and one thereby could insure that the 
subjects within a condition were receiving 
identical treatment. Finally, an optimal 
method probably would contain provision for 
feedback concerning the accuracy of responses. 

To further summarize, it also seems reason- 
able to advance affirmative answers to Ques- 
tions c and e of the introduction. From the 
results of the experiments that have examined 
specific transfer of induced conservations, it 
seems evident that the phenomenon of spe- 
cific transfer does in fact occur. Data sup- 
portive of specific transfer are quite congruent 
with Piaget’s ideas about the generality of 
conservation concepts, and a failure to find 
specific transfer might well be construed as 
evidence that conservation induction was not 
adequate (i.e., specific transfer might be used 
as an acquisition criterion). So far as Ques- 
tion e is concerned, it seems likely that some 
first-order conservations are more resistant to 
extinction than others. The fact that resist- 
ance to extinction may be related to the natu- 
ral acquisition. sequence for first-order con- 


servations makes this result all the more 
meaningful, 

Due to inadequate data pertaining to Ques- 
tions d and f, as well as the obvious theoreti- 
cal import of these questions, we are forced 
to suspend judgment on the issues of non- 
specific transfer and the relative resistance to 
extinction of natural versus trained conservers. 
It is particularly unfortunate that the data 
about nonspecific transfer are incomplete, 
since solid evidence about this phenomenon 
could constitute a direct test of Piaget’s belief 
(e.g., 1949) that the conservations of first- 
order thought are rather limited and isolated 
constructions. Although most of the experi- 
ments failed to support nonspecific transfer, 
the somewhat questionable supportive studies 
of Sullivan (1967) and Gelman (1969) must 
be given due consideration, Concerning the 
relative resistance to extinction of induced ver- 
Sus natural conservers, all the acceptable stud- 
ies support our contention that it is not neces- 
sarily the case that induced conservers should 
be less resistant to extinction than natural 
conservers. Analyses of Piagetian theory pre- 
sented by Inhelder (1962) and Furth (1969) 
also tend to support this contention, But the 
data nonetheless are tentative, so replications 
and extensions are greatly needed. 


Future Research 


One fairly evident direction of future re- 
search is towards accurate replications of the 
methods of the successful training studies, 
especially those involving weight conservation, 
Such replications should lay to rest what- 
ever resistance remains to the conclusion that 
Conservations can be trained (e.g., Mermel- 
stein & Meyer, 1969). Some of this work has 
already begun: Smith (1968) applied Beilin’s 
(1965) successful number-training method to 
the training of weight conservation: Winer 
(1968) replicated Smedslund’s (1961a) “con- 
flict” procedure, 

As previously noted, there as yet has been 
no attempt to train the first-order conserva- 
tions of height and area. If our knowledge 
about first-order conservations is to be com- 
plete, experimental inductions of height and 
area are needed, Of course, inductions of 
height and area Conservations might provide 
interesting data about the generality of the 
training model Proposed earlier, 

Naturally, all the questions that have been 
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asked and/or answered about inductions of 
first-order conservations must be posed again 
for second-order conservations such as volume, 
density, momentum, and rectilinear motion. 
While the questions remain the same, the 
answers may be quite different. Given the 
special characteristics of formal thought, one 
might expect that the training of only one 
form of reversibility may not be so effective 
in inducing second-order conservations as it 
apparently is in inducing first-order conserva- 
tions. Instead, one might predict that an 
induction method which promotes the coordi- 
nation of the two reversibilities of classes and 
relations would be more effective for training 
the second-order conservations. 

The comments about nonspecific transfer of 
first-order conservations make clear the need 
for more evidence before one can satisfactorily 
answer the question of whether or not non- 
specific transfer occurs. Ideally, the solution 
to the nonspecific transfer question should be 
pursued as follows: For each first-order in- 
variant, one would choose that induction 
method proven to be most effective in training 
the particular invariant being considered, One 
then would proceed to train multiple experi- 
mental groups (one transfer group for every 
first-order invariant, exclusive of the invariant 
being trained). Once satisfied that the experi- 
mental subjects were indeed Superior to the 
controls, one would assess the amount of non- 
Specific conservation transfer to each of the 
nontrained first-order invariants, Repetitions 
of this multiple-group method for the case of 
each first-order invariant ultimately would 
yield a many-way classification matrix. 
Such a matrix would specify the nonspecific 
transfer (if any) of any and all first- 
Order conservations to any other first-order 
conservation, 

The preceding method for studying non- 
Specific transfer could be generalized to 
experimental inductions of second-order con- 
servations. However, the predictions about 
nonspecific transfer of second-order conserva- 
tions are the inverse of the predictions for 
first-order conservations, The postulated gen- 
erality of second-order thought leads one to 
expect significant nonspecific transfer among 
quantitative invariants such as volume, 
density, momentum, and rectilinear motion. 

Questions about the extinction of first- 
Order conservations also remain to be an- 
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Swered by future experiments. Investigations 
of the extinction of height and area conserva- 
lions may well decide the validity of the 
assertion that the more primitive first-order 
conservations should be more resistant to 
extinction. Also, more research is needed that 
is concerned with the problem of the relative 
resistance to extinction exhibited by induced 
and natural conservers. In the event that 
further research continues to find no differ- 
ence in the relative extinction resistance of 
induced versus natural conservers, a point 
will have been scored in favor of hypothesized 
qualitative changes in cognition. The accept- 
ability of these studies of relative extinction 
resistance will turn on the adequacy of their 
conservation acquisition criteria. 

In the absence of any studies concerned 
with second-order conservations, it is trivially 
apparent that much remains to be done with 
second-order extinction phenomena. Given the 
postulated features of second-order thought 
(a thoroughly integrated system of compen- 
sating operations), one would expect that all 
of the second-order conservations (whether 
induced or natural) would be less resistant 
to extinction than the least resistant of their 
first-order counterparts. Also because of the 
postulated features of second-order thought, 
one could predict that the extinction of one 
second-order conservation might well irradiate 
to other second-order conservations. 

As a final note on future research, it is our 
hope that future investigators will be more 
systematic and rigorous in formulating and 
examining specific substantive issues such as 
“reversibility,” “decentering,” and “learning 
set.” Without more adequate formulations of 
to-be-tested principles, the present unsatisfac- 
tory state of affairs with respect to con- 
servation induction will continue to obtain, 
namely, a relative predominance of inferential 
conclusions over direct conclusions. 
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One hundred and sixty-six studies are reviewed of predictions of outcome of 
individual psychotherapy with adult patients. Predictors are classed as patient, 
therapist, or treatment factors; the number of predictors which were significant 
versus nonsignificant are tallied. By far, the largest number deals with 


patient factors—relatively few with therapist or treatment. Those patient 


factors which were most often signific 
psychological health or adequacy of personality functioning, 


antly associated with improvement are 


absence of 


schizoid trends, motivation, intelligence, anxiety, educational and social assets, 
and experiencing (rated from early sessons). Therapist factors are experience, 


attitude and interest patterns, empathy, 


and similarity of patient and therapist. 


The treatment factors revealed one main trend: the number of sessions. The 
review ends with a methodological evaluation and a suggestion for cross- 


validation of the main predictors. 


When a patient and psychotherapist agree 
to meet, is what follows largely an unpredict- 
able venture? Most psychotherapists believe 
it is predictable because patients, as a group, 
will improve; it is unpredictable because only 
a few of the factors influencing the fate of the 
individual patient in psychotherapy can be 
discerned, even after a thorough initial evalua- 
tion and even after the early sessions. All 
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psychotherapists agree on this one fact: Some 
patients seem to improve; others do not. 
Responsibility for such differences could theo- 
retically be traced to a variety of sources— 
the qualities of the patient and therapist, the 
mode of treatment, or some higher order 
interaction of these factors. 

Digging through past research does not un- 
earth an easy path to the relative influences 
of those factors. The line of inquiry with 
the deepest roots in practice is that of 
clinical research. Through this route Freud 
(1913) concluded, in terms of his well-known 
analogy between chess and psychotherapy, 
that we know only some of the opening 
and closing moves; for the rest we have 
only intuitively applied guidelines. Therefore, 
“this gap in instruction can only be filled by a 
diligent study of games fought out by 
masters [p. 123].” This kind of tutorial ex- 
ercise or apprenticeship training, coupled with 
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self-scrutiny, has been one of the primary 
means of ferreting out factors governing the 
outcome of treatment. Among the attempts to 
map out the area, three classical clinical 
papers serve as landmarks: Freud (1937), 
Rogers (1957), and Rosenzweig _ (1936). 
Qualities of the therapist thought to influence 
the course of psychotherapy were catalogued 
by Holt and Luborsky (1958); a similar 
series of patient qualities was enumerated by 
Wallerstein, Robbins, Sargent, and Luborsky 
(1956). Also available are several compre- 
hensive reviews, for instance, Wolberg (1967). 
The relatively newer and fewer quantitative 
studies need more systematic reviews. Some of 
them are part of surveys of prediction of 
change in mental patients regardless of type 
of treatment (Fulkerson & Barry, 1961; 
Windle, 1952), some are in a review of issues 
and trends in psychotherapy research (Strupp 
& Bergin, 19693), and some are part of a 
book in preparation (Bergin & Garfield, 
1970). The impact of the quantitative re- 
search on the practice of psychotherapy has 
been negligible (Luborsky, 1969). Clinical 
research and quantitative research have 
tended to stay distant from each other; those 
who know one tend not to know the other. 
The scattered studies which may come to a 
therapist’s attention often have contradictory 
findings and lack clinical sophistication in 
their conception and interpretation, Whether 
or not a skeptical attitude is justified on the 
basis of the existing quantitative research 
can only be judged from a thorough overview 
—which is the primary goal of this survey. 

Our present review of the quantitative re- 
search is intended to serve several specific 
purposes: 

1. To offer guidelines to clinicians in the 
form of lists of qualities of patient, therapist, 
and patient-therapist interaction which have 
been shown to relate to various criteria of out- 
come. The lists are being developed into an 
easily applied Prognostic index.? These guide- 
lines may eventually suggest modifications to 
psychotherapy in the interest of improving 
therapeutic results, For example, if it turns 
out that the therapists empathy leads to 


, S Auerbach, A. H., & Luborsky, L. A prognostic 
index for psychotherapy. In preparation, 1970. 
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benefits for the patient, therapists who have 
or can develop that quality will be in demand. 

2. To systematically compare the clinical 
and quantitative lists of factors, What should 
be investigated by quantitative research will 
be highlighted when we find clinical areas 
with no quantitative exploration. 

3. To provide a methodological evaluation 
of the research as a guide to future investiga- 
tion. 


Limits of the Literature Search 


To accomplish these aims, all quantitative 
studies of the factors which influence outcome 
of individual psychotherapy for adult patients 
were examined. They were included if there 
was at least some attempt to provide reason- 
ably controlled comparisons, and the con- 
clusions were passably supported. This meant 
including all relevant quantitative studies ex- 
cept for a few poorly conceived or ambiguous 
ones. 

For the definition of psychotherapy, we 
followed the lead of Zax and Klein (1960) in 
their review of the types of changes that occur 
via psychotherapy. They put limits on the 
Scope of their search by following Snyder’s 
(1947) definition of psychotherapy, which 
rules out research primarily on educational 
and guidance activities emphasizing the giving 
of information. Also excluded were occupa- 
tional therapy, shock therapy, chemotherapy, 
behavior therapy, and laboratory analogues 
of psychotherapy unless these latter were 
compared with psychotherapy. Excluded from 
the main body of the review were articles 
predicting only the length of the treatment, 
or only the patient’s remaining or leaving, 
rather than the gains made by the patient. 
Most studies were omitted that were avail- 
able only in the form of unpublished theses 
or in foreign language journals. Governed by 
these delimitations, 166 studies were finally 
included in our review (see Appendix), and 
form the substrate for our conclusions. They 
cover a period of 23 years of research—from 
1946 through 1969. Most of our eligible 166 
are from Strupp and Bergin’s (1969b) bibli- 
ography of all types of individual psycho- 
therapy research through 1967 in which there 
are listed approximately 2.700 publications. 
Obviously, quantitative prognostic studies of 
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factors influencing change in individual psy- 
chotherapy have been relatively scarce. 


Main Factors Influencing Outcome of Psy- 
chotherapy 


The main factors can be organized within 
the model presented by Sanford (1962). It 
is a diagram applicable to any social system 
designed to change the person who passes 
through it (Figure 1). 

Patient 1 (P-1) refers to the patient be- 
fore he begins treatment; Therapist 1 (T-1) 
to the therapist before he begins interacting 
with the patient; the P-T Interaction rec- 
tangle refers to the period of treatment; 
Patient 2 (P-2) to the patient at termination 
of treatment; and Patient 3 (P-3) to the 
patient one year after the treatment has 
ended. The model suggested the divisions 
under which we classified results of each 
study: I: Patient Factors (before Treatment 
and Judged from the Sessions) ; II: Therapist 
Factors (before Treatment and Judged from 
the Sessions); III: The Match between Pa- 
tient and Therapist (Patient and Therapist 
Assessed Apart from Treatment); IV: Treat- 
ment Factors. 

With this organization of the Appendix, one 
can see at a glance the direction of the 
findings in each division: A plus sign (4-) 
indicates a significant positive relationship 
between predictor and criterion; a minus sign 
(—) a significant negative relationship; a 
question mark (?) a significant relationship 
but unclearly related to the main trend under 
that heading; a zero (0) a nonsignificant 
relationship. An asterisk indicates that the 
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therapist's rating of outcome was a criterion. 
(For each study we first prepared a detailed, 
one-page summary which included sample 
size, type of patients, treatment mode, and 
predictive and criterion measures.) 

We should note that there is a distinct 
asymmetry in the weights to be attached to 
significant and nonsignificant findings—the 
latter receiving much less weight. Obviously, 
a nonsignificant result does not warrant the 
positive conclusion of no relationship, but 
merely the absence of evidence sufficient to 
conclude that a relationship exists. Given the 
likely poor statistical power of much of this 
research (small samples, measurement-error 
attenuated relationships), this point takes on 
particular force (Cohen, 1962; 1965, pp. 95- 
101). 

A few qualities remain significantly pre- 
dictive across several studies in the Appendix, 
despite different patients, different forms of 
treatment, and different criteria of outcome. 
Only a few of them manage this feat, but by 
accomplishing it become more worthy of our 
attention. These qualities are listed in Table 1 
and then are reviewed in more detail. 


I: PATIENT Factors 


Adequacy of General Personality Functioning 


We have included terms from a variety of 
psychological languages—mental health versus 
sickness, pathology, psychotic tendencies, in- 
tegration, ego strength, adjustment, and dys- 
function. Probably all arrive at similar global 
estimates of adequacy of general personality 
functioning. Of the 28 studies that fall within 
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Fic. 1. Plan for assessment of the patient and therapist (cf. Sanford in Strupp & Luborsky, 1962, 
p. 155) before, during, and after going through the psychotherapy change system. 
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TABLE 1 


CONDENSED SUMMARY OF MAIN TRENDS—NUMBER OF QUANTITATIVE STUDIES WITH 


droxteroawe versed NONSIONIFICANY RIELATIUASIP MPH PenoreroR 


AND OUTCOME MEASURES 


Number of studies 
Main trend 
Significant" Nonsignificant 
Patient Factors before Treatment (pp. 162-1715) 
Adequacy of Personality Functioning: 
Integration, mental health, etc. 6;1— 8 
Miscellaneous test findings 15 2 0 13 
Ego strength 4 5 
TAT adequacy 2 0 
Rorschach Prognostic Rating Scale 6 3 
Rorschach (general) 11 7 
Diagnosis, especially absence of psychotic trends T 0 
Motivation 4 1 
Expectation 3 2 
Intelligence 7 2 
Other Intellectual Skills 3 1 
Anxiety 54 3} 
Presence of other affects 5 0 
Human relations interest 4 0 
Age 4;2— 5 
Social class 2;1— 2 r^ 
Education 5 2 
Student status 3 1 
Previous psychotherapy 0 3 
Patient Factors as Judged from the Treatment (pp. 171-172) 
Likability 2 0 
Experiencing 6 0 
"Therapist Factors Before Treatment (pp. 172-174) 
Experience 8 4 
Skill 3 2 
Interest pattern and attitudes 6;1— 2 
Therapist Factors Judged from the Treatment (pp. 174-175) 
Empathy (judged from tape recordings) 3 3 
Other empathy measures 4 2 
The Match between Patient and Therapist (pp. 175-176) 
Similarities between patient and therapist 10; 1—; 3 
1 curvilinear 
Treatment Factors (pp. 176-179) 
Time-limited versus unlimited treatment 2;1— 5 
Number of sessions 20 2 
Waiting time before beginning psychotherapy 3 0 


a Page numbers in Appendix of summaries, 


b Minus sign indicates number of negatively significant studies. 


this category, 15 show a significant relation- 
ship between the level of initial personality 
functioning and outcome of treatment; of 
these, 14 are in the positive direction. They 
indicate that the healthier the patient is to 
begin with, the better the outcome—or the 
converse—the sicker he is to begin with, the 
poorer the outcome. Only one study indicates 
a significant negative relationship—the sicker 


the patient, the better the outcome. Gotts- 
chalk, Mayerson, and Gottlieb (1967) showed 
that the patients with the higher psychiatric 
morbidity scale ratings fared the best in the 
six-session, short-term treatment. The rema" 
ing 13 studies are nonsignificant. our 

Many diverse studies contributed Pitfer- 
findings. Tt was our hope to discern 4 posi- 
ences in methods or samples for the : 
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tively significant versus the 13 nonsignificant. 
Several types of interstudy differences were 
examined: (a) The severity of illness in the 
sample for each study: No obvious differences 
appear. (b) The use of difference scores 
versus improvement ratings as a criterion: 
Difference scores may make positive findings 
less likely (see section entitled Evaluation of 
Criteria, Item 5). Three of the four studies 
using difference scores are in the nonsig- 
nificant group (Cartwright & Roth, 1957; 
Klein, 1960; Luborsky, 1962); one just 
reaches significance (Fiske, Cartwright, & 
Kirtner, 1964). (c) The type of initial assess- 
ment in the predictors: The Appendix studies 
are grouped according to the type of assess- 
ment—observer ratings (14), miscellaneous 
(2), Barron Ego Strength (9), TAT (2). No 
obvious differences appear within those sub- 
groups. 

The Rorschach test results were sum- 
marized separately because of the difficulty 
of knowing exactly what was being measured. 
Among the prognostic tests, the Rorschach 
Prognostic Rating Scale (RPRS) of Klopfer, 
Kirkner, Wisham, and Baker (1951) turned 
out to be a big surprise. In nine studies the 
RPRS had been applied before therapy; in 
six of the nine, a significant positive relation- 
ship emerged; high RPRS is associated with 
improvement in psychotherapy. These Ror- 
schach scores are weighted combinations of 
six variables thought to be related to ade- 
quacy of personality functioning. This result, 
therefore, is consistent with the significant 
studies under the heading of General Person- 
ality Functioning (described previously). Of 
18 other studies in which the Rorschach test 
had been used less systematically (ie., con- 
ceptual relationship of variables to initial per- 

' sonality functioning and to outcome of treat- 
ment was not considered), 11 were found 
where test signs predicted outcome. Since the 
Rorschach test can yield many different 
scores, these results are less impressive than 
those of the single RPRS. 

Two main conclusions emerge. First, the 
initial level of the patient’s illness is a crucial 
factor: /nitially sicker patients do not im- 
| prove as much with psychotherapy as the 

Í initially healthier do. (Possibly, as Astrup 

| believes—Astrup and Noreik, 1966—the pa- 
| 
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tients qualities are even more important for 
his improvement than the psychotherapy or 
other treatment he receives.) Second, some 
improvement is shown by patients, sohatever 
their initial level of functioning (e.g., Klein, 
1960; Luborsky, 1962). A safe prediction, 
therefore, is that any method of psycho- 
therapy in which one person tries to help an- 
other will usually yield gains for the one de- 
signated to be the patient. (A story with a 
similar point has been persistently retold 
with glee around the Menninger Hospital: 
A visitor once asked the receptionist, “How 
can you tell the patients from the doctors? 
They all look alike." The receptionist replied, 
“The patients get better.) 


Diagnosis (Especially, Absence of Psychotic 
Trends) 


The implications here are similar to the 
findings under the section Personality Func- 
tioning. In various samples, the more serious 
diagnoses (involving the terms schizophrenia, 
psychotic trends, or psychosis) are associated 
with less improvement in psychotherapy 
(Gottschalk et al., 1967; Hamburg, Bibring, 
Fisher, Stanton, Wallerstein, Weinstock, & 
Haggard, 1967; Harris & Christiansen, 1946; 
Karush, Daniels, O'Connor, & Stern, 1968; 
Katz, Lorr, & Rubinstein, 1958; Stephens & 
Astrup, 1963; Tolman & Mayer, 1957). 
Within the schizophrenic groups, the severity 
distinction of process versus nonprocess has a 
similar predictive power for the outcome of 
psychotherapy (Stephens & Astrup, 1963). 


Motivation and/or Expectation 


The common clinical opinion of the value 
of good motivation for treatment is upheld 
by four out of five studies (R. Cartwright & 
Lerner, 1963; Conrad, 1952; Schroeder, 1960; 
Strupp, Wallach, Jenkins, & Wogan, 1963). 
The nonsignificant study is by Siegel and Fink 
(1962). Patient's expectation of change is 
similarly predictive in three out of five studies 
(Goldstein & Shipman, 1961; Lipkin, 1954; 
Uhlenhuth & Duncan, 1968); two are non- 
significant (Brady, Reznikoff, & Zeller, 1960; 
Goldstein, 1960). 

Although amount of motivation and/or 
expectation tends to be positively related to 
outcome, type of motivation is not predictive 
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(Gliedman, Stone, Frank, Nash, & Imber, 
1957); surprisingly, patients with congruent 
motives for treatment do not fare better than 
those with noncongruent motives (e.g., treat- 
ment should change their life situation, rather 
than themselves). Similarly, type of trans- 
ference expectation is not predictive (Apfel. 
baum, 1958). 

In two studies (Goodman, 1960; Rosen- 
baum, Friedlander, & Kaplan, 1956), pay- 
ment of a fee is related to gains. Motivation 
may be implicated—payment of a fee may 
(a) increase motivation, or (b) presuppose 
good motivation. Another condition may hold: 
Those who are able to pay a fee may have 
other social assets which make treatment for 
them more auspicious (see section entitled 
Social Achievements). 


Intelligence 


Seven of the nine Studies using different 
Ways of estimating intelligence show that 
patients with higher initial intelligence per- 
formed better in psychotherapy. Al but two 
of the nine studies are based on the Wechsler 
Intelligence Tests—either full-scale or four 
subtests. The significant studies are by Barron 
(1953a) ; Casner (1950); Fiske et al. (1964); 
Miles, Barrabee, and Finesinger (1951); 
Rioch and Lubin (1959); Rosenberg (1954); 
Zigler and Phillips (1961). Harris and Christi- 
ansen (1946) and Rosenbaum et al. (1956) 
show a nonsignificant relationship. 

Three out of four more diverse estimates of 
intellectual skills are in the same direction 
(Barry & Fulkerson, 1966; McNair, Lorr, 
Young, Roth, & Boyd, 1964; Sullivan, Miller, 
& Smelser, 1958). One obvious way of under- 
standing these findings is that Psychotherapy 
requires learning, and those who learn most 
readily do better, 


Affect 


We found nine studies of psychotherapy in 
which initial anxiety level was assessed. In 
five, a significant relationship was obtained 
between high initial anxiety and a criterion 
of change (Gallagher, 1954; Gottschalk et al., 
1967; Hamburg et al, 1967; Kirtner & 
Cartwright, 1958a: Luborsky, 1962). Non- 


significant studies were Bergin and Jasper 
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(1969); Distler, May, and Tuma (1964); 
Katz et al. (1958); Roth, Rhudick, Shaskan, 
Slobin, Wilkinson, and Young (1964). In one 
“nonsignificant” study (Distler et al., 1964), 
a positive significant relationship was found 
for women and a nonsignificant one for men. 
In sum, five and “one-half” studies confirm 
that patients with high anxiety at the initial 
evaluation or at beginning of treatment are 
the ones likely to benefit from psychotherapy. 
High initial anxiety probably indicates a 
readiness, or at least an openness, for change. 

Under the heading of Other Affects we have 
listed five other studies suggesting that it is 
not only initial anxiety which is a good prog- 
nostic sign, but the presence of any strong 
affect, such as depression (Astrup & Noreik, 
1966; Conrad, 1952; Gallagher, 1954; Gotts- 
chalk et al, 1967; Uhlenhuth & Duncan, 
1968). Patients with flattening of affect have 
a poor prognosis (Astrup & Noreik, 1966). 
These findings are not unique for psycho- 
therapy; they appear to be a good prognostic 
sign for a variety of other treatments; for 
example, treatment by drugs (Beecher, 1959) 
or, possibly, no formal treatment at all, as in 
the case of acute depression. The overall con- 
clusion about affects is that almost any affect 
is better than no affect, and that anxiety and 
depression are probably the two “best” initial 
affects. The presence of these strong affects 
may indicate the patient is in pain and asking 
for help. The absence of affect very likely goes 
along with a state in which the patient is not 
reaching out for help, or has given up. 

In two studies the number of complaints 
on the Symptom Check List was found to be 
a positive sign (Stone, Frank, Nash, & Imber, 
1961; Truax, Wargo, Frank, Imber, Battle, 
Hoehn-Saric, Nash, & Stone, 1966)—the im- 
plication may be the same, that the patient is 
in pain and asking for help. The Symptom 
Check List score is probably not primarily a 
"severity of illness? measure; it is best classi- 
fied here along with anxiety and other affects. 


Ethnocentrism 


Ethnocentrism is a significantly negative 
predictor in two out of three studies (Barron, 
1953a; Tougas, 1954: it is insignificant. in 
Rosen, 1954), 
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Human Relations Interest 


Four studies show this to be a promising 
characteristic of patients in psychotherapy 
(Gottschalk et al., 1967; Isaacs & Haggard, 
1966; Rayner & Hahn, 1964; Rosenbaum 
et al., 1956). 


Coping or Defensive Style 

Defensiveness was shown in two out of three 
studies to be negatively related to improve- 
ment in psychotherapy (Strupp et al., 1963; 
Zolik & Hollon, 1960; with one nonsignificant 
—Raskin, 1949), 


Somatic Concern 


Somatic and health concerns were shown to 
be negative indicators in two studies (Rosen- 
berg, 1954; Stone et al., 1961). 


Self-Awareness, Insight, and Sensitivity 


Three out of five studies showed these 
qualities were positively related to outcome 
(Conrad, 1952; Rosenberg, 1954; Zolik & 
Hollon, 1960—nonsignificant were the studies 
of Raskin, 1949; Rosenbaum et al., 1956). 


Age 

Older patients tend to have a slightly poorer 
prognosis. Of 11 studies, 4 show that younger 
patients profited more from psychotherapy 
(Casner, 1950; Hamburg et al., 1967; Stone 
et al., 1961; Zigler & Phillips, 1961). Non- 
significant results were obtained by Bloom 
(1956), D. S. Cartwright (1955), Gaylin 
(1966), Gottschalk et al. (1967), Seeman 
(1954). The two negative studies that found 
older patients did better were in the context 
of a limited age range (Conrad, 1952; Knapp, 
Levin, McCarter, Wermer, & Zetzel, 1960). 


Sex 


In five studies men and women have about 
the same chances of benefiting from psycho- 
therapy (D. S. Cartwright, 1955; Gaylin, 
1966; Hamburg et al, 1967; Knapp et al., 
1960; May, 1968). In two studies, however, 
the women did better (Mintz, Luborsky, & 
Auerbach, 1971; Seeman, 1954). 


Social Achievements 


In general, patients with higher social 
achievements are better suited for psycho- 
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therapy. This would be expected, because 
people who can achieve in spheres requiring 
social skills should also do so in psycho- 
therapy. Various social achievements have 
been examined: socioeconomic (social class), 
occupational, educational, and marital. Of 
these, educational achievement has most sup- 
porting studies (Bloom, 1956; Casner, 1950; 
Hamburg et al., 1967; McNair et al, 1964; 
Sullivan et al, 1958). Of the two nonsigni- 
ficant studies (Knapp et al, 1960; Rosen- 
baum et al., 1956), the study by Knapp et al. 
was based on uniformly well-educated psy- 
choanalytic patients. 

Combinations of these achievements into 
comprehensive measures of social competence 
have been successful predictors of improve- 
ment with psychotherapy (Stone et al., 1961) 
and with hospitalization (Zigler & Phillips, 
1961). 


Student Status 


Three studies (D. S. Cartwright, 1955; 
Casner, 1950; Rogers & Dymond, 1954) 
found that student status is associated with 
improvement; one found that it makes no 
difference (Gaylin, 1966). The facilitation 
provided by student status is probably a 
function of the similarity felt between the 
patient and therapist and/or the fact that 
being a student implies social competence. 
The same reasoning may partly explain the 
finding that professional people, including 
patients who are analytic candidates or psy- 
chiatrists, are more likely to complete treat- 
ment than the general population patients 
(Hamburg et al., 1967). 


Previous Psychotherapy 


Findings in three studies (Hamburg et al., 
1967; Klein, 1960; McNair et al., 1964) agree 
that previous psychotherapy makes no signifi- 
cant difference in predicting the outcome of 
a patient’s current psychotherapy! It is hard 
to explain; it would seem natural to expect 
that previous response would predict future 


response. 
Patient Factors (Judged from the Sessions) 


This obviously is a profitable area. Most of 
the studies involving judgments of the early 
phases of the treatment turned out to be pre- 
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-dictive of the final outcome. There is a higher 
percentage of successful prediction on this 
basis—a sample of the patient's actual be- 
havior in treatment—than on the basis of the 
patient's state before he begins treatment. 

Likability has been rated from segments 
of tape recordings and is significantly related 
to the outcome of treatment (Stoler, 1963, 
1966). Under the heading Patient Factors 
Before Treatment, another study is positive 
(Strupp et al., 1963), and one is nonsignifi- 
cant (Gottschalk et al, 1967). Patient “at- 
tractiveness for psychotherapy" described 
earlier may well be a similar concept, and was 
significantly related to outcome. In general, 
liking a patient may tend to be associated 
with the inclination to believe the patient is 
attractive as a patient for psychotherapy, and 
these judgments may actually provide favor- 
able conditions for growth, as Rosenthal and 
Jacobson (1968) found for school children. 

Much successful prediction has come from 
Rogers’ process scale composed of seven 
strands which, in his first paper on the topic 
(Rogers, 1959), he called relationship to feel- 
ing, degree of incongruence, manner of ex- 
periencing, communication of self, construing 
of experiencing, relationship to problems, and 
manner of relating to others. They have been 
altered slightly in later studies by Rogers 
and his students. One of the most repeatedly 
successful is manner of experiencing (a term 
first suggested by Gendlin). Tt implies that 
the patient is capable of experiencing deeply 
and immediately, and of being reflectively 
aware about this feeling. A low score indi- 
cated the patient was remote from his experi- 
encing and unable to understand its implicit 
meanings. Scales have been developed by 
Gendlin, Beebe, Cassens, and Oberlander 
(1968) and others which show fairly good 
interjudge reliability; the scales are based 
upon very brief segments—4 minutes of the 
treatment session. Of the six studies involved 
with the Experiencing scale, all are positively 
and significantly related to the patient's im- 
provement (Gendlin, Jenney, & Shlien, 1960; 
Gendlin et al, 1968; Kirtner, Cartwright, 
Robertson, & Fiske, 1961; Tomlinson, 1967: 
Tomlinson & Hart, 1962; Walker, Rablen, & 
Rogers, 1960). The findings by Gendlin et 
al. (1968) that patients do better who start 
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treatment with a higher level of “process” is 
consistent with the major trend for greater 
assets before treatment to be a positive portent 
—“the rich get richer." 

A long series of studies involving the dis- 
comfort-relief quotient (Dollard & Mowrer, 
1953) was based on objective word counts of 
patients’ statements. Some of the studies 
showed that decrease in the discomfort-relief 
quotient (increased comfort) indicates suc- 
cessful outcome of psychotherapy, and some 
were nonsignificant (Mowrer, Hunt, & Kogan, 
1953). 


II. THERAPIST Factors 


Therapist Factors (Assessed Apart from the 
Session) 


Only three topics among the variety of ex- 
plored therapist factors have noteworthy re- 
lationships to outcome: the therapist’s level 
of experience, his skill, and his interest pat- 
tern, 

Thirteen studies were found dealing with 
level of experience. Eight of these showed a 
significant positive relationship to the pa- 
tient’s improvement (Barrett-Lennard, 1962; 
Cartwright & Lerner, 1963; Cartwright & 
Vogel, 1960; Katz et al., 1958; Knapp et al., 
1960; Miles et al, 1951; Myers & Auld, 
1955; Rice, 1965). The four studies with 
nonsignificant findings were Fiske et al., 1964; 
Grigg, 1961; Mindess, 1953; Sullivan et al., 
1958. One study was difficult to classify be- 
cause it showed that inexperienced therapists 
performed well, but only under limited cir- 
cumstances (R. Cartwright & Lerner, 1963). 

Tn three of five studies, therapists skill was 
shown to positively influence recovery (Klein, 
1965; Nash, Hoehn-Saric, Battle, Stone, Im- 
ber, & Frank, 1965; Nichols & Beck, 1960; 
insignificant were Imber, Frank, Nash, Stone, 
& Gliedman, 1957; Muench, 1965), 

Several studies on therapists’ interest pat- 
terns, mainly using the Strong Vocational In- 
terest Blank with a key developed by Betz 
(1963) for Type A therapists (problem-solv- 
ing approach) versus Type B therapists 
(mechanical interests) showed significant re- 
lationships to patient improvement for schizo- 
phrenic patients. Most of the reports seem 
based upon the same or successive samples of 
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Phipps Psychiatric Clinic patients and thera- 
pists (e.g., Betz & Whitehorn, 1956; White- 
horn & Betz, 1954, 1957), except for the study 
by Lichtenberg (1958; also described by Betz, 
1963) which was based upon a sample from 
Sheppard and Enoch Pratt Hospital. How- 
ever, McNair, Callahan, and Lorr (1962) 
found a reverse effect: Neurotic patients of 
Type B therapists improved significantly more 
than neurotic patients of Type A therapists. 
However, a more careful evaluation needs to 
be made of the Type A versus Type B thera- 
pist distinction. The replication by Stephens 
and Astrup (1963, 1965), using the same 
Phipps Clinic patient sample but controlling 
for form of schizophrenia, revealed no rela- 
tionship between Type A or B therapists and 
discharge status. Also, multivariate taxonomic 
studies suggest more than two value types 
(cf. Welkowitz, Cohen, & Ortmeyer, 1967). 

Aside from predicting the outcome of psy- 
chotherapy, the A-B dichotomy has some 
substance, as shown by its correlations with 
other dimensions (see reviews by Carson, 
1967 and Silverman, 1967). A-type therapists, 
for example, are more field dependent than 
B-type therapists on the Witkin Rod and 
Frame test (Pollock & Kiev, 1963). 


Therapist Factors (Judged from the Sessions) 

Empathy may be a promising therapist 
variable, as judged from psychotherapy ses- 
sions (either by judges from the tape and 
transcript, or by patient or therapist from 
their experiences with each other in the 
therapy session). Tt is significant in three out 
of six studies when rated from brief tape 
samples of the session (Rogers, Gendlin, 
Kiesler, & Truax, 1967; Truax, 1963; Truax 
et al., 1966), but the same or other measures 
of empathy are nonsignificant in studies by 
Bergin and Jasper (1969), Rogers et al. 
(1967), Mintz et al. (1971). In four out of 
six findings based on ratings by the patient 
and the therapist, empathy is significant (Bar- 
rett-Lennard, 1962; Cartwright & Lerner, 
1963; Feitel, 1968; Lesser, 1961) and non- 
significant for other measures (in Cartwright 
& Lerner, 1963; Lesser, 1961). When com- 
bined with Warmth and Genuineness, the 
predictive power of empathy is increased 
(Truax et al, 1966), and similarly with 


other variables in the "Relationship Inven- 
tory” (Barrett-Lennard, 1962). 

The implication seems clear: The thera- 
pist’s empathy (and other related qualities) 
facilitates the patient’s gains from psycho- 
therapy. But what seems like a clear im- 
plication, on closer inspection may prove to 
be more complicated. The causal direction of 
the relationship may not be one-way: Pa- 
tients who are improving, or who reveal to the 
therapist their capacity to improve, may 
elicit from him more empathy, or may at- 
tribute to him more empathy! 


III. THe MATCH BETWEEN PATIENT AND 
THERAPIST (ASSESSED APART FROM THE 
SESSIONS) 


Fourteen studies deal with some form of 
similarity between therapist and patient. Nine 
show a positive relationship: greater simi- 
larity is associated with better outcome 
(Graham, 1960; Hollingshead & Redlich, 
1958; Landfield & Nawas, 1964; Lesser, 
1961; Sapolsky, 1965; Schonfield, Stone, 
Hoehn-Saric, Imber, & Pande, 1969; Sheehan, 
1953; Tuma & Gustad, 1957; Welkowitz et 
al., 1967). Some measures were not significant: 
similarity of profile shape on the MMPI (Car- 
son & Llewellyn, 1966; Lichtenstein, 1966) 
and Q-sort similarity of patient's and thera- 
pist’s ideal selves (Hunt, Ewing, LaForge, & 
Gilbert, 1959). One report also contained a 
significant negative relationship between simi- 
larity of patient’s and therapist’s self-percep- 
tions and progress in treatment (Lesser, 
1961), and one a curvilinear relationship for 
MMPI profile shapes (Carson & Heine, 1962). 
The variety of forms of positive similarity 
includes social class, interests, values, and 
compatibility of orientation to interpersonal 
relations. A feeling of similarity seems to 
provide a more significant relationship be- 
tween the therapist and patient and, there- 
fore, a better outcome to treatment. 


IV. TREATMENT Factors 


Different types of treatment probably have 
differential effects, but it is hard to know 
what type of treatment has what effect. The 
large array of quantitative studies contains 
only a few of each treatment type, and there- 
fore offers only a few tentative trends: 
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1. Three studies compared individual versus 
group psychotherapy. Baehr (1954) found 
individual psychotherapy to be slightly su- 
perior to group psychotherapy, but Imber et 
al. (1957) found no differences at the end 
of 6 months of treatment; Stone et al. (1961) 
found none 5 years later; Pearl (1955) found 
group treatment superior. No generalization 
is therefore possible. 

2. Three studies found the combination of 
individual and group psychotherapy was better 
than either individual or group psychotherapy 
alone (Baehr, 1954; Conrad, 1952; Peck, 
1949). 

3. Surprisingly few reports in the literature 
compare the effectiveness of psychotherapy 
and pharmacotherapy. The three studies we 
found of schizophrenic patients (Fairweather, 
Simon, Gebhard, Weingarten, Holland, Sand- 
ers, Stone, & Reahl, 1960; Grinspoon, Ewalt, 
& Shader, 1968; May, 1968) suggest that 
psychotherapy combined with pharmacother- 
apy is more effective than psychotherapy 
alone, but in most ways not more effective 
than pharmacotherapy alone. Similar trends 
emerged from studies of neurotic patients 
(Daneman, 1961; Gibbs, Wilkins, & Lauter- 
bach, 1957; Lorr, McNair, Weinstein, Mi- 
chaux, & Raskin, 1961; Lorr, McNair, & 
Weinstein, 1963; Rickels, Cattell, Weise, 
Gray, Yee, Mallin, & Aaronson, 1966; Roth 
et al., 1964), but this latter group is less well- 
controlled and the role of psychotherapy as 
a treatment method was limited. In all of 
these latter studies the treatment, generally 
on a once-a-week basis, did not exceed 8 
weeks. 


School of treatment usually made no mea- 
surable difference, according to a handful of 
studies where the most frequent comparisons 
involved client-centered, psychoanalytic, and 
Adlerian therapy (R. Cartwright, 1966; 
Heine, 1953; Shlien, Mosak, & Driekers, 
1962; Tougas, 1954). 

It has been thought—at least since Otto 
Rank—that treatments which are structured 
from the outset as time-limited, might per- 
form as well as time-unlimited treatment. 
Most of the research so far has shown no 
significant difference between the two time 
conditions (Frank, Gliedman, Imber, Stone, 
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& Nash, 1959; Henry & Shlien, 1958; Pascal 
& Zax, 1956; Shlien, 1957; Shlien et al, 
1962). 

In 20 of 22 studies of essentially time- 
unlimited treatment, the length of treatment 
was positively related to outcome; the longer 
the duration of treatment or the more ses- 
sions, the better the outcome! It is a tempta- 
tion to conclude—and it may be an accurate 
conclusion—that if psychotherapy is a good 
thing, then the more the better. Other in- 
terpretations, however, may also fit: (a) Pa- 
tients who are getting what they need, stay 
in treatment longer; those who are not, drop 
out sooner. (b) Therapists may overestimate 
positive change in patients who have been 
in treatment longer. A complimentary trend 
may also operate—therapists often assume 
some minimum number of sessions are needed 
before real change can occur, so that early 
dropouts tend to get poor outcome ratings. 

In three out of three studies, a long and 
mandatory wait between the time of applying 
for psychotherapy and beginning it is nega- 
tively related to outcome (Gordon & Cart- 
wright, 1954; Roth et al., 1964; Uhlenhuth 
& Duncan, 1968). Two main implications may 
be drawn from these results: First, the prác- 
tice of using the patient as his own control by 
having him wait for psychotherapy and re- 
testing him during the waiting period has in 
itself a negative impact on his future psy- 
chotherapy. The waiting experience may be 
a negative one and therefore cannot be ac- 
cepted as a neutral, no-therapy period. Sec- 
ond, clinics with long waiting lists should be 
aware of these studies, and should try to 
provide service close to the time the patient 
applies for it. 


Comparison of Results with those of “Drop- 
Out versus Stay-In Psychotherapy” 


Studies in which the criterion for outcome 
is “dropping out versus staying in treat- 
ment” have been excluded from the review, 
since there is no explicit evidence that this 
variable is consistently related to the amount 
of gain a patient makes. There is, however, 
some indirect evidence from studies showing 
that length of treatment is positively related 
to gain from psychotherapy. 
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Fulkerson and Barry (1961) have reviewed 
this area, but the only extensive survey is by 
Brandt (1965) of factors influencing patients 
to drop out. It provides some interesting com- 
parisons with the present review. Brandt con- 
cluded there is little uniformity in the data 
presented, the variables controlled, the vari- 
ables investigated, the base lines, the defini- 
tions, and the findings reported in 25 studies 
dealing with premature terminators among 
adult patients in long-term individual out- 
patient psychotherapy. Of the 29 variables 
investigated by 18 researchers and research 
teams (in Brandts review), only sex, age, 
and marital status consistently did not dif- 
ferentiate between dropouts and remainers. 
The only variables which consistently dií- 
ferentiated between the two groups were per- 
sonality characteristics. The personality char- 
acteristics and methods for determining them 
differed widely from one study to another. 

Brandt made no attempt to compare his 
findings with those for predicting outcome of 
psychotherapy in general; nor did he explain 
the consistencies in the data he presented. In 
his table summarizing the dropout studies 
which do or do not differentiate between re- 
mainers and terminators for 18 studies and 
29 predictors, he did not note that for educa- 
tion, five of the seven studies showed that 
higher education goes along with staying in 
treatment—as is noted in our review. For oc- 
cupation, four of the seven studies which 
mentioned occupational status showed a posi- 
tive relationship to remaining—almost the 
same as in our review. Marital status appears 
in six studies; in all six there is no differentia- 
tion between the two groups; this is also 
similar to our review in which four of the 
five studies showed no significant difference 
in marital status. For personality character- 
istics, seven out of seven differentiate. (Brandt 
does not indicate which personality charac- 
teristics were mentioned most frequently.) 
For the Rorschach, four out of six studies dif- 
ferentiate; this is similar to our review. Seven 
out of seven studies in which age is men- 
tioned showed no differentiation between re- 
mainers and terminators—this is not similar 
to our review, which often indicates age to 
be inversely related to amount of sain from 
treatment. There was no difference between 


the sexes in two out of two studies; this is 
somewhat the same as in our review. 

Lorr, Katz, and Rubenstein (1958) re- 
ported the cross-validation of a test battery 
(Terminator-Remainer Battery) designed to 
predict early termination of psychotherapy. 
The Terminator-Remainer battery consists of 
subtests taken from the Manifest Anxiety 
Scale, the Behavior Disturbance Scale, and 
the F Scale, as well as sociability, ideal-actual 
self, education, vocabulary, and motivation 
(therapist's rating). Remainers have a history 
of being less impulsive—with less antisocial 
behavior and more anxiety, more self-critical- 
ity, and less inclination to endorse rigid irra- 
tional beliefs. They are also more retiring in 
interpersonal relationships, better educated, 
have better vocabularies, and are considered 
by therapists to be more highly motivated for 
psychotherapy. Three studies of Terminator- 
Remainer battery have had useful predictive 
power for a veteran population. However, few 
of the ratings or tests (as compared with 
background factors) added significantly to 
the Terminator-Remainer battery. The one 
that added most was the therapist’s rating of 
patient’s motivation for psychotherapy. Re- 
mainers and terminators seemed to be two 
separate patient populations who reacted dif- 
ferently in psychotherapy. The Terminator- 
Remainer battery and other measures can 
identify many in these two groups. Therapists 
have some influence, but not a large one, on 
the proportions of both populations they can 
hold in treatment. 

Tn conclusion, many of these characteristics 
of remainers versus dropouts seem similar to 
those from our review of factors influencing 
outcome of psychotherapy. 


OvERALL CONCLUSIONS ABOUT THE MAIN 
Factors INFLUENCING OUTCOME OF 
PsYCHOTHERAPY 


Main Findings from the Survey 


It is easy to become overimpressed with 
the limits of the group of quantitative studies 
by the limits of the individual studies. Most 
of the single studies are weak reeds because 
of their small sample size, small number, unre- 
liability of measures, and brevity of treat- 
ment. Although we may sometimes be steered 
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the wrong way because all the studies on 
certain predictors may be subject to the same 
error, in taking them together and trying to 
discern agreements and disagreements, some 
consistencies emerge which probably will stand 
up to further testing. Jt adds fiber to a find- 
ing when it is resilient enough to appear in 
different groups and by different assessment 
methods. When there have been divergent re- 
sults on the same qualities, it has sometimes 
—though less often than anticipated—been 
possible to review the studies and locate the 
probable responsible agent in the nature of 
the patient groups or criterion measures. 
Much more of this type of reviewing remains 
to be done. 

The list in Table 1 contains the essence of 
the quantitative research on the predictors of 
benefiting from psychotherapy. It may have 
value as a guide in the selection of patients 
and therapists, as well as for elucidating the 
process of psychotherapy. A brief formulation 
of the necessary ingredients for psychother- 
apy, based on Table 1, follows: 

1. Most research conclusions have. been 
about the patient, especially of the patient as 
he was before treatment: the more adequate 
his general personality functioning, the better 
his future course in psychotherapy. Similarly, 
the higher his intelligence and other intellec- 
tual skills, the better his future in psycho- 
therapy. Patients most likely to succeed in 
treatment come highly motivated for it and 
expect it to help. The treatment is best begun 


at a time when the patient is upset and shows , 


it by high levels of anxiety and distress and 
the presence of other affects such as depres- 
sion. Younger patients,often are more pliable 
and make more changes. Higher educational 
attainment and other social achievements 
probably, are in part an expression of adequate 
general personality functioning, intelligence, 
and motivation. During treatment patients do 
better who are likable and capable of deeply 
experiencing and reflecting on their experi- 
encing. 

2. Much less has been established about 
the therapist: his experience level and skill, 
as well as his ability to show empathy during 
the session, seem important. 

3. The match between the patient and 
therapist is facilitated by similarities in values, 
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attitudes, interests, and social class. The pa- 
tient’s intellectual and social attainments may 
increase his sense of having more in com- 
mon with the therapist. 

4. The comparative studies of type, meth- 
ods, and schools of treatment are insufficient: 
those that exist are inconsistent. Treatment 
factors, therefore, have had little established 
about them (possibly such treatment factors 
are less potent than patient and therapist 
factors), except that those patients do better 
who start treatment when they apply for it 
(and presumably are more ready for it) and 
persist in it longer. Some studies indicate that 
time-limited therapy does not do worse than 
time-unlimited, and that group therapy added 
to individual therapy does better than either 
one separately. Psychotherapy with pharma- 
cotherapy tends to be Slightly more effective 
than psychotherapy alone, but in most ways 
not more effective than pharmacothera 
alone—especially for schizophrenic datena 


Some Promising Predictive Combinations 


It seems a safe, overall conclusion that 
factors in the patient and therapist, and”in 
the patient-therapist interacti in 
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ness, level of anxiety, and capacity for ob- 
ject relations. 

4. Truax et al. (1966) found increased 
success in predicting improvement by com- 
bining empathy, warmth, and genuineness. 

5. A multidimensional prognostic index of 
32 items was developed both on clinical 
grounds and on the basis of this review and 
is being tested in several populations (Auer- 
bach & Luborsky; see Footnote 3). It should 
be useful for clinicians who wish to try this 
combination of promising variables as part 
of their evaluations of patients for psycho- 
therapy. No such index has ever been con- 
structed for nonpsychotic patients. It will be 
informative to compare its predictive power 
with indexes for psychotic patients (Lorr, 
Wittman, & Schanberger, 1951; Thorne, 
1952), and with indexes for predicting sus- 
ceptibility and recovery from physical illness 
(Thurlow, 1967). 

1 6. In no studies of nonpsychotic patients 
‘has there been adequate representation of 
both the therapist and patient variables; 
therefore no estimates can be made of the 
relative contribution of the patient versus 
therapist to the variance of outcome. Probably 
the sicker the patient, the less impact a 
therapist can have—assuming the generaliza- 
tion that openness to constructive change 
varies with severity of illness. However, it 
seems unlikely that the study by Astrup and 
Noreik (1966) on psychotic patients applies 
equally to nonpsychotic ones—that it is al- 
most entirely the patient's initial state which 
£determines his future course either with or 


without psychotherapy! 


Comparison of the C "antitative Survey with 
Clinical Wisdom 


Clinical knowledge has assembled much 
more wisdom than the quantitative has man- 
aged to garner. Although clinical lore has a 
higher percentage of error, it addresses itself 
to the array of issues which the clinician must 
confront now. What follows are a few of the 
outstanding discrepancies in the two litera- 
tures: 

1. Quantitative studies, in their stress upon 
the qualities of the patient and the relation- 
ship with the therapist, give far less weight 
to the technique of treatment than do clinical 
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writings. Accuracy of interpretation has had 
relatively little quantitative work done on it, 
except for the work on empathy (which may 
or may not be the same)—and here the ac- 
curacy of the empathy has not been ade- 
quately explored. It is damaging to some of 
the research on empathy that estimates of 
therapist's empathy can be reliably rated on 
the basis of the therapist's statements alone, 
independent of whether the judge has read 
what the patient has said (Truax, 1966). 

2. The tremendous emphasis in the clinical 
literature on the importance of providing in- 
sight to the patient has had no quantitative 
investigation in relation to outcome variables, 
except for two studies of initial insight which 
gave divergent results: Raskin (1949), Zolik 
and Hollon (1960). 

3. Nothing exists in the quantitative litera- 
ture on qualities of the patient which make 
him amenable to various forms of treatment. 
The clinical literature is full of such discus- 
sions, which are neatly summarized by Wal- 
lerstein et al. (1956). A frequent reason for 
conducting a diagnostic evaluation is to de- 
cide on the patient's suitability for psycho- 
analysis versus other forms of treatment. 
There are dozens of clinical articles on how 
to make this judgment, but no quantitative 
ones. At the present time the best conclusion 
from the quantitative literature is that those 
patients who are most suitable for psycho- 
therapy are also the ones most suitable for 
psychoanalysis. 

4. A large part of the clinical literature is 
based on long-term treatment; almost all the 
quantitative research is on short-term treat- 
ment. 


METHODOLOGICAL ÉVALUATION OF THE 
STUDIES 


Evaluation of Criterion 


1. The most frequent criterion measure, 
and usually the only one, is the therapist's 
gross improvement rating. The fact that only 
one criterion measure is used is a significant 
limitation; the fact that it is an improvement 
rating rather than a raw difference score is 
psychometrically advantageous. On the other 
hand, the fact that it is provided by the 
therapist, a committed participant in the 
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therapeutic exchange, is a distinct disad- 
vantage. At too many points where a rela- 
tionship is found (e.g., for empathy or length 
of treatment), the causal relationship may be 
the reverse of what is supposed, or due to 
other factors. Although both the therapist 
and patient may be biased judges, their esti- 
mates have some face validity and should be 
used along with the estimates of outside 
judges. 

It seems unlikely that improvement ratings 
(whether by therapist, patient, or outside 
observer) should be given up (in favor of the 
“residual gain scores” or “target symptom 
approach” to be described). At first glance, 
improvement ratings seem poor because the 
judge can hardly be expected to recall the 
exact level at which the patient started as a 
base for estimating the improvement. A differ- 
ence score corrected for initial level might 
seem to be the only answer. On soberer re- 
flection, however, one should look more kindly 
upon the improvement score because the 
judge can give his own weight to the quality 
of the improvement (and he can be reminded 
of the initial level) while any difference score 
might be more arbitrary. 

2. Where several criterion measures are 
available, correlations among them are usu- 
ally low and often not statistically significant. 
The only criterion measure which tends to 
have consistent significant correlations with 
other criterion measures is the therapist’s 
ratings of success or improvement (e.g., Fiske 
et al., 1964). The fact that it is the therapists 
rating of improvement which is most used is 
advantageous from this point of view, but 
See paragraph 1 of this section. 

3. Since many different kinds of changes 
occur in psychotherapy and measures of these 
are often not highly correlated 
speak of the predictors of change in psycho- 
therapy. According to Fiske et al. (1964), 
four main change factors are identifiable: 
favorable self-evaluation, adequacy as mea- 
sured by TAT, therapist's perception of 
change, and patient's reported symptoms. A 
predictor of one kind of change may not and 
probably will not predict another kind of 
change. We have not taken this into account 
in the summary of the number 
studies, but have tried to consid 
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discussion. (Humor about psychiatry has long 
recognized this: There was the man who had 
just paid $50 to a psychiatrist to cure his in- 
leriority complex—that is, unfavorable seli- 
evaluation criterion—who left the session and 
on his way home got fined $50 for talking 
back to a policeman—that is, social-con- 
formity criterion.) 

4. Dropping out versus staying in psycho- 
therapy probably has some similarity to other 
outcome criteria such as therapist's rating of 
improvement, because: (a) The initial-state 
correlates of drop-out versus stay-in appear t0 
be similar to those for improvement ratings, 
as shown in a preceding section, Comparison 
of Results of Drop-Out versus Stay-In Psy- 
chotherapy. (b) Drop-out versus stay-in is 
probably related to the number of sessions; 
the latter is highly correlated with the usual 
criteria of outcome. (c) Patients who drop 
out early are often viewed as not improving. 

5. Few studies use as a criterion of change 
a “residual gain score,” that is, scores from 
which the correlation of “pre” and “post” has 
been removed, and none allow for the effects 
of error components of scores on correlations. 
Instead, they use as a criterion an improve- 
ment or success rating, or sometimes raw dif- 
ference scores. Those studies which use differ- 
ence scores tend to obtain zero or negative 
correlations with predictors ( Cartwright & 
Roth, 1957; Fiske et al, 1964; Luborsky, 
1962). 

The use of raw difference scores D = Y 
— X (“post” minus "pre") is dubious on tw? 
grounds: First, the intent in D is to have @ 
measure free of differences in “pre,” that is, 
for rxp to equal zero. Yet it is an algebraic 
necessity that zy not, in general, equal zero. 
Thus, D does not accomplish the purpose for 
Which it is intended. Further, the “finding” 
that 75; is negative is virtually mandated by 
the algebra, and moreover, this correlation is 
spurious in that both X and D — Y — X share 
the same measurement error in X, resulting 
in the irony that the larger the error in X 
(the lower X's reliability), the larger (nega- 
tive) is rrp. 

Second, the reliability of 7) scores is gen- 
erally poor, thus attenuating all correlations 
oF group differences which involve it. The 
use of residual difference scores, Y - X, that 
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is, F from which X has been removed by 
linear regression (or, equivalently, the use 
of D - X "residual gain scores"), results in a 
score which correlates zero with X. Although 
this is an improvement over D Scores, it re- 
moves the observed X out of observed V (or 
D), whereas what is wanted, as Tucker, 
Damarin, and Messick (1966) argued, is true 
X out of true Y (or D). Their “base-free 
measure of change" G replaces observed scores 
with true scores in the residualization. Their 
very useful article gives formulas for the re- 
liability of G scores, and the correlation of 
G with other single variables as well as other 
G scores. These formulas all require reliability 
coefficients for X, which is a most desirable 
bit of information, however the scores are to 
be used. On psychometric grounds, if any 
function of pre- and postscores is to be used 
as criterion methods, we would recommend G. 

6. Almost all of these studies are geared to 
predict the amount of change. However, 
Brenman (1952) observed that a small change 
in a crucial area may make a huge difference 
for a patient. If the criterion measure was a 
more tangible and reasonable one—for ex- 
ample, the type of change the patient needs 
or desires, the predictability of the criterion 
might be increased. Only a few studies have 
taken this direction of trying to predict 
changes in certain areas. The Johns Hopkins 
group is one of the few who tried to predict 
changes in "target symptoms" (Battle, Imber, 
Hoehn-Saric, Stone, Nash, & Frank, 1966). 

7. Prediction might be more successful and 
more useful to the therapist when it is geared 
toward the prediction of the types of prob- 
lems that will occur in the course of the 
treatment. 'The majority of the predictions 
made in the Menninger Foundation Psycho- 
therapy Research Project were aimed toward 
prediction of behavior during treatment (Sar- 
gent, Horowitz, Wallerstein, & Appelbaum, 
1968.) Unfortunately, final analyses of data 
are not yet complete. 

8. Very few of the studies have follow-up 
assessment, that is, an assessment beyond the 
end of the psychotherapy. This is a deficiency 
because some patients do continue to change 
in the year or two following the termination 
of their psychotherapy. However, it is not a 


major deficit because correlations between 
end-of-treatment assessments and follow-up 
assessments tend to be high. 


Predictor Problems 


1. Many studies make predictions from 
single predictors to a single criterion. Yet in 
making predictions clinically, we never rely 
on one predictor. A more desirable approach 
would be to use multiple predictors, more- 
over, to use them in multivariate form and 
in ways which facilitate the discovery of 
nonlinear and configurational relationships 
(Cohen, 1968). Of the 166 studies, only a 
few are moderately comprehensive in the 
number of predictive (and criterion) vari- 
ables; hardly any used multivariate data- 
analytic forms. In the comprehensive category 
are the University of Chicago Counseling 
Center Study by Rogers and Dymond (1954); 
Fiske et al. (1964); the series at Johns Hop- 
kins Phipps Clinic, for instance, Frank et al. 
(1959); Gottschalk et al. (1967); Rogers et 
al. (1967); and the Menninger Study (Wal- 
lerstein et al., 1956). 

2. Most of the studies, if they do have 
more than one predictor variable, do not in- 
clude variables from both the patient and the 
therapist at the same time. 

3. Much of what is considered to be pre- 
diction seems really to be an evaluation of 
the patient as he is now, with the expectation 
that he will be somewhat the same later as 
he is now. (It is like the story of the man 
who asked of his Rabbi, *How will life be 
for me if I move across the river?" The 
Rabbi asked, “Well, tell me; how is life for 
you now?" The man replied, “It is bitter." 
The Rabbi then forecast, ^Tt will be bitter 
across the river.") There seems to be good 
reason for this; for example, McNair et al. 
(1964) found that initial score on seven 
measures is the best predictor of later score 
on the same measures. The experience of 
prediction in the Menninger Study (Lubor- 
sky, 1962) also suggests that the best estimate 
of where the patient will be at the end of 
therapy is to be made from a proper evalua- 
tion of where the patient is now, then adding 
to that level some moderate but not too large 
increase. (There is a tendency for the judges 
to expect that patients who are very sick will 
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not gain as much as they do, and a tendency 
across all patients to expect more change to 
take place than actually does.) 

4. The predictors and the setting of treat- 
ment often interact. This has been investi- 
gated systematically only in terms of pre- 
dictors related to early release from mental 
hospitals (Cohen, 1968). Cohen found that a 
prognostic variable or set of variables de- 
pended on which other variables or sets were 
partialled out or controlled. Some variables 
(education, church attendance) are related to 
early release in opposite ways, from one hos- 
pital to another. Marital and hospitalization 
history variables are important predictors and 
are consistent across hospitals, as are some 
psychologist-rated admission symptom factors. 

5. A successful prediction of change has to 
take into account the type of change that the 
patient and therapist are aiming for. Not tak- 
ing this into account can lead to markedly 
different estimates of improvement or success 
of the treatment. One patient, for example, 
presented as his main target problem at the 

: beginning of treatment a hopelessness, futility, 
and a suicidal inclination, all starting 6 
months earlier with the loss of a friend. The 
same patient had become an overt homosexual 
during the preceding 3 to 4 years. If a change 
from homosexuality to heterosexuality were 
taken by the therapist or patient as the target 
goal, the treatment would legitimately have 
been considered a failure. As it was, the pa- 
tient ended up the treatment still with homo- 
sexual tendencies, but feeling less hopeless and 
futile, and both the patient and therapist in- 
dependently agreed upon the treatments suc- 
cess. 

6. What is the best way to evaluate these 
predictive variables? Are tests adequate, or 
can most of what is needed be evaluated 
through nontest indicators? Most predictive 
variables can be evaluated by interviewing 
the patient, or through a sample of the early 
sessions of psychotherapy. A few can be 
evaluated through tests such as those for in- 
telligence and for general personality func- 
tioning (e.g., Rorschach). 


Other Problems of Method 


1. Almost all of these studies are based on 
groups of patients who are diverse in type 
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and in initial severity of illness. Predictions 
within more homogeneous subgroups should 
produce better results for some variables. On 
the other hand, other variables (e.g., age) 
might “wash out” in homogeneous subgroups. 
Luborsky (1962) and Luborsky and Schimek 
(1964) found that for the neurotic patients 
(rated above 50 on the Health-Sickness Rat- 
ing Scale), anxiety seemed to serve as an 
impetus to improve, but for the borderline 
or psychotic patients (50 or below on the 
Health-Sickness Rating Scale), anxiety did 
not or could not serve this useful purpose. 
In the present review we have noted the pre- 
dictors which apply to neurotic versus psy- 
chotic groups. Future prediction studies might 
succeed better by attempting to predict in 
even more homogeneous groups or, equiva- 
lently, by the use of Group X Predictor in- 
teraction variables. These are variables which 
account for criterion variance due to pre- 
dictors correlating to different degrees in 
different groups (Cohen, 1968). 

2. Almost all of these studies are based 
on relatively short-term treatment. Most of 
the treatment lengths are less than 30 to not 
more than 40 sessions. The only exceptions 
are psychoanalytic treatments (Klein, 1960; 
Knapp et al., 1960); a psychoanalytic prac- 
tice survey (Hamburg et al., 1967); and the 
Menninger project (Wallerstein et al., 1956) 
which is still not complete partly because the 
research project itself becomes long-term when 
long-term treatment is being investigated. 
Essentially, then, the results of this review 
apply to short-term treatment, The factors in- 
fluencing change might well be similar for 
long-term treatment, but we can find little 
evidence from the quantitative literature. One 
slight evidence for similarity of some results 
for short- and long-term treatment comes from 
Cartwright, Kirtner, and Fiske (1963). When 
they selected a group of patients who had 37 
or more sessions and intercorrelated the cri- 
teria of change, the results were comparable 
to the entire sample—the Change variables 
were no more highly intercorrelated for this 
subgroup. 

3. In virtually every study, there is "? 
control over or systematic knowledge of ee 
obtaining of counsel through other than DsY- 
chotherapy. Yn only a few studies is an effort 
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made to find out about the use of other re- 
sources; for instance, the “situational vari- 
ables” interview in the Menninger study 
(Sargent, Modlin, Faris, & Voth, 1959), or a 
questionnaire on seeking or receiving guidance 
from a variety of sources (Paul, 1967). In 
terms of drug research, it would be like test- 
ing responsiveness to a drug when allied sub- 
stances are freely available and no record is 
kept of what else has been ingested by the 
subject. 

4. In most studies no effort has been made 
to determine the quality of the psychotherapy 
offered to each patient; it must vary widely. 
The earliest remedy for this was begun by 
Rogers’ (1959) methods of scoring process. 

5. If there is an inclination to publish sig- 
nificant results and not to publish insignificant 
ones, it would influence what we have avail- 
able to survey (Cohen, 1962)! We have no 
evidence that this happens more frequently 
in this area than in others. The percentage of 
nonsignificant studies reported may be of in- 
terest—it is approximately 2496. From this 
we see that investigators do seem to report, 
and editors accept, nonsignificant results 
fairly frequently; how often they refrain is 
not known. But bias against nonsignificant 
results is known to exist. Also, the larger 
percentage of significant results might be ex- 
pected—it is our faith that if an investigator 
has a hunch that a certain variable is sig- 
nificant, he is more apt to be right than 
wrong. 

6. Individuals must differ in their readiness 
to change, either with or without treatment, 
or regardless of what type of treatment is 
offered, We have placed this point near the 
end of our review to emphasize it: Our sur- 
vey is limited to the factors which influence 
change as a result of psychotherapy only for 
those who have started psychotherapy. For 
most of the studies surveyed, therefore, it 
cannot be determined whether the type of 
individual who profits most would also have 


profited from another form of treatment, or 
from change-inducing experiences which 
usually are not designated as psychotherapy— 
or indeed from nothing more than the myste- 
rious changes attributed to the passage of 
time. The patient who presents himself with 
many assets for psychotherapy may be especi- 
ally capable of achieving his ends by a variety 
of means—by a variety of psychotherapies, 
by medications, or by other resources such as 
talking with the bartender, with friends, with 
his minister, or by “keeping his own counsel.” 
By applying the Prognostic Index for Psy- 
chotherapy or other prognostic instruments to 
groups treated by a variety of methods, we 
might learn more about the issue of general 
readiness to change. 


The Need for a New Cross-Validation Study 


The present review should lead directly to 
cross-validation studies of the predictors listed 
in Table 1. One of these (Luborsky & Auer- 
bach)* is now in progress. It incorporates the 
essence of the promising predictors and some 
of the above methodological suggestions such 
as the use of long-term treatment, tape re- 
cordings of the treatment to permit intensive 
process scoring and multivariate analysis 
of the predictors with various criteria of 


outcome. 


Summary 


An exhaustive survey has been made of 
quantitative studies of factors influencing the 
outcome of psychotherapy. The content con- 
clusions are listed under the heading Overall 
Conclusions About the Main Factors Influenc- 
ing Outcome of Psychotherapy; the method- 
ological conclusions are listed under the head- 
ing Methodological Evaluation of the Studies 
—both in preceding sections of this review. 


1Luborsky, L., & Auerbach, A. H. An 80-patient 
predictive study. In preparation, 1970. 


92 LUBORSKY, CHANDLER, AUERBACH, COHEN, AND BACHRACH 


APPENDIX 


LEGEND 
= nonsignificant relationship. 
significant positive relationship (p < .05). 


0 
iis 
— = significant negative relationship (p < .05). 
? 


the studies under that heading. 
* 


significant relationship (p < .05), but unclearly related to the main trend of 


— 'Therapist's rating of outcome was the criterion or one of the criteria. 


I. PATIENT FACTORS INFLUENCING THE OUTCOME OF PSYCHOTHERAPY 


A. Personality Factors: Before Treatment 


|. Degree of Initial Disturbance versus Adequacy of Functioning 


Initial Q-adjustment (client estimate) 
unrelated to change in Q-adjustment (r — —.25) 

Initial “interview-diagnostician” ratings related to 
therapist's (T's) posttreatment evaluation (r = .24).* 

Initial “dysfunction” (5-point scale) rated by judges 
unrelated to change in dysfunction. 

Initial Health-Sickness Rating (HSR) (—.18 with change 
in HSR).* 

Initial HSR (.71 with termination HSR and .54 with im- 
provement rating by independent research team).* 

Degree of initial illness (measured by Rotter test and 
Maslow Inventory) unrelated to outcome (on Sentence 
Completion Test, Maslow Inventory, and Therapy 
Movement scale).* 

Initial adjustment or integration unrelated (r — .08) to 
"success." * 

More disabled patients (Ps) (Psychiatric Morbidity Scale) 
had best response to brief psychotherapy (ratings by 
independent research psychiatrists). 

Severe ego weakness and relative ego strength (extremes) 
related to outcome. 

Initial degree of maladjustment (MMPI) correlated 
.50 with improvement ("discrepancies procedure’’).* 

Degree of initial disturbance correlated (r = .60) 
with success.* 

Initial adjustment unrelated to improvement (except 
for discomfort scale; see section below, Number of 
Complaints). 

Global estimates of severity of illness unrelated to outcome.* 

Degree of pathology (Behavioral Disturbance Scale) 
unrelated to improvement ratings (232 Veterans 
Administration—VA—outpatients).* 

Less disturbance in personality structure related to 
success in client-centered therapy.* 


Miscellaneous Test Findings 


Less pathology (MMPI, Cattell Personality Factor) 
related to improvement (interviewer ratings).* 
Less deviant MMPI profiles related to improvement 

(mainly neurotic Ps).* 


Barron Ego Strength Scale 


Ego strength with improvement (judges' ratings) in 
three separate samples (.01 level for difference between 
improved versus unimproved). 


e p s o 


oo 


D. S. Cartwright & Roth 
(1957) 

Fiske et al. (1964) 

Klein (1960) 

Luborsky (1962) 

Luborsky (1962 


Muench (1965) 


Seeman (1954) 


Gottschalk et al. (1967) 


Karush et al. (1968) 
Apfelbaum (1958) 
Strupp et al. (1963) 


Truax et al. (1966) 


Roth et al. (1964) 
Katz et al. (1958) 


Kirtner & Cartwright 
(19582) 


Hunt et al. (1959) 


Sullivan et al. (1958) 


Barron (1953b) 


C ROM. 


.* 
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Ego strength unrelated for the total group of schizo- 
phrenics (but positive for men and negative for women). 
Ego strength. 


Ego strength unrelated to improvement (but related for 
females: r = .23, p < .05, though not for males).* 

Ego strength related to improvement.* 

Ego strength.* 

Ego strength.* 

Ego strength discriminated unimproved from greatly im- 
proved groups—the extremes of a hospitalized sample in 
psychotherapy.* 

Ego strength related (p « .001) to improvement (also 
supervisor ratings). 


TAT 


Initial TAT adequacy related to a residual gain score 
showing decrease in MMPI Hs Hy elevations.* 

Ratings based on pretreatment TAT protocols related 
to outcome.* 


Rorschach: Klopfer Rorschach Prognostic Rating Scale 
(RPRS) 


RPRS (for normally productive Ps, mean RPRS higher for 
improved Ps; unrelated for underproductive Ps). 

RPRS (pretherapy weighted score with counselor rating 
of P's success (but » = only 13).* 

RPRS (.43 with improvement—for n = 21; .38 for 40 un- 
treated Ps).* 

RPRS unrelated to improvement (in client-centered 


counseling).* 
RPRS (.67, pretherapy total weighted score and a 2-point 
criterion). 
RPRS unrelated to outcome (judges! ratings). 
RPRS unrelated to change and symptom improvement.* 
RPRS (.66 with outcome—judges' ratings) 
RPRS discriminated between 19 most versus 16 least 


improved.* 


Rorschach: General Rorschach Scores 


Rorschach (Harris-Christiansen Prognostic Index) un- 
related to outcome (judges' ratings). 

10 of 11 Rorschach signs differentiated most and least im- 
proved among group of 36 promiscuous women in 
“psychiatric treatment.” * 

“Inadequate” response to Card IV on Rorschach (as a 
measure of attitudes toward authority) related to un- 
favorable outcome. 

Three Rorschach scoring categories (R, Ch, C) out of 10 
correlated .40 with outcome.* 

Global ratings based on Rorschach protocols unrelated 
to outcome (follow-up social adjustment). 

Index based on 9 Rorschach signs used in combination 
related (.43) to outcome (follow-up social adjustment) 

M, Non F + % EA and other scores by themselves 


unrelated to outcome. 


toot 


16: 


Distler et al. (1964) 


Endicott & Endicott 
(1964) 

Getter & Sundland 
(1962) 

McNair et al. (1962) 

Gallagher (1954) 

Sullivan et al. (1958) 

Wirt (1955) 


Wirt (1956) 


Fiske et al. (1964) 


Kirtner & Cartwright 
(1958a) 


Bloom (1956) 
R. D. Cartwright (1958) 


Endicott & Endicott 
(1964) 
Fiske et al. (1964) 


Kirkner, Wisham, & 
Gt (1953) 

Film&-Bennett (1955) 

Whiteley & Blaine (1967) 

Mindess (1953) 

Sheehan, Frederick, 
Rosevear, & Spiegel- 
man (1954) 


Barron (1953a) 


Bradway, Lion, & Cor- 
rigan (1946) 


Dana (1954) 

Endicott & Endicott 
(1964) 

Filmer-Bennett (1955) 

Filmer-Bennett (1952) 


Gaylin (1966) 
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Subjective ratings of Rorschach protocols related to 
outcome (supervisors’ ratings). 

Four Rorschach scoring categories (M, FM, m, and shading) 
out of more than 15 related to outcome (judges ratings). 

Presence of negative Rorschach signs related to treatment 
failure; “positive” signs not found to be discriminating. 

No location determinant related to outcome. 


Of 11 Rorschach scores, none related to outcome (judges' 
ratings) in 51 VA outpatients. 

Three separate methods of Rorschach analysis* (also 
supervisors' ratings). 

Rated by general Rorschach analysis, a variety of 
personality factors related to outcome.* 

Sum of M, FM, and m responses differentiated most from 
least improved in 35 outpatients.* 

Specific scoring categories and overall clinical evaluations 
of Rorschach records differentiated improved and un- 
improved cases.* 

A discriminant function measure using total K, m, and R 
predicted change (p < .05) and symptomatic improve- 
ment (p < .01).* 


2. Diagnosis (Especially Absence of Psychotic Trends) 


Absence of social alienation-personal disorganization 
("schizophrenic-like" phenomena).* 

Absence of “schizophrenia” as presenting symptom: 
Fewer at end of completed treatments improved (in 
character structure and symptom cure). Anxiety group 
had more improved patients.* 

Absence of subclinical psychotic trends (inferred from 
MMPI and Rorschach). 

Psychoneurotic and psychosomatic Ps showed more im- 
provement than psychotic or character-disorder patients.* 

“Nonprocess” did better than “process” schizophrenia. 


A greater proportion of improved Ps were initially 
diagnosed as *'psychoneurotic reaction." 

Psychiatric diagnosis: schizophrenia or borderline state 
related to lack of symptomatic change. 

Obsessional Ps did better (p < .02) than other diagnostic 
groups; hysterics did very well or badly, depending on 
experience level of T. 

Anxiety neurotics showed more improvement than con- 
version hysterics, anxiety hysterics, or compulsive- 
obsessive Ps. 


3. Chronicity 


Overall posthospital adjustment was best for “short-term 
psychotics,” next best for nonpsychotics, and poorest 
for long-term psychotics.* 

Degree of chronicity (examination of records) for 100 
female inpatients unrelated to outcome. 

Duration of anxiety syndrome (examination of records) 
before hospitalization unrelated to outcome. 

Suddenness of onset related to social recovery (post- 
hospital adjustment) in psychotic mothers. 


+ 


oot + 


+++ 


+ 


+ + +++ 


Harris & Christiansen 
(1946) 
Kirkner et al. (1953 2" 


Rioch & Lubin (1959) 
Rioch & Lubin (1959) 
Harris & Christiansen 
(1946) 
Roberts (1954) | 


Rogers & Hammond | 
(1953) r 
Rosenberg (1954) É 


Sheehan et al. (1954) ^a 


oue 


Siegel (1951) 


Whiteley & Blaine (1967) 


Gottschalk et al. (1967) 


Hamburg et al. (1967) 


Harris & Christiansen 
^ 


(1946) 
Katz et al. (1958) 


Stephens & Astrup 
(1963) 

Tolman & Mayer (1957) 

Karush et al. (1968) 


Knapp et al. (1960) 


Yaskin (1936) | 


Fairweather et al. (1960) 


Barry & Fulkerson 
(1966) 
Miles et al. (1951) 


Morrow & Robins (1964) 
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Greater chronicity of present illness (first 10 years of illness) 
related to outcome. 
Presence of precipitating event related to outcome. 


4. Motivation and/or Expectations 
a. Motivation 


Need to change (discrepancy score between P's self- 
description now, and as he would like to be) with four- 
component improvement criteria (p < .01).* 

Ps motivated by need to change were rated as more 
successful.* 

High acceptance of responsibility related (5 < .05) to 
degree of improvement.* 

Motivation (clinical judgment by psychiatric team; from 
case record put in low versus high groups) unrelated 
to outcome. 

Motivation for treatment related .48 (p < .01) to outcome.* 


b. Expectations 


P's expectations unrelated to improvement scores.* 

P's expectation of change unrelated to change (Q-sort). 
(Neither were 7"s nor combined P’s and T's expectations.) 

P's expectations of symptom reduction positively and 
curvilinearly related to perceived symptom reduction. 

Ps who expected more positive results changed more. 

Optimism about outcome. 


c. General 


Type of “transference expectations” unrelated to outcome. 

Congruent (e.g., help with personal problems) versus 
noncongruent motivation (e.g., somatic complaints). 

Fee-paying clients profited more from treatment than 
non-fee-paying clients.* 

"Much improved" group contained more of fee-paying 
clients in group of 210 Ps.* 

Self-referred versus court-referred. 

Preparation for treatment by role-induction interview.* 


5. Attitudes toward. Self and Treatment 

Acceptance of responsibility related to degree of movement 
in treatment.* 

Predictive index based on valence and direction of client 
statements correlated .90 with outcome.* 

Overall favorableness of conscious attitudes to psychiatric 
hospitals, psychiatrists, and treatment (on Attitude tests) 
related to outcome. On projective picture attitude tests, 
perception of treatment situation as neutral and hospital 
as supporting, and the P role as both active and passive, 
related to outcome.* 

Ps who were more favorably oriented toward counselor 
changed more. 

Initial attitudes rated from interview materials unrelated 
to outcome in 10 nondirective treatment cases.* . 

Six measures based on a 60-adjective feeling-and-attitude 
scale unrelated to outcome.* 


+ 
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Uhlenhuth & Duncan 
(1968) 
Karush et al. (1968) 


R. D. Cartwright & 
Lerner (1963) 


Conrad (1952) 
Schroeder (1960) 
Siegal & Fink (1962) 


Strupp et al. (1963) 


Brady et al. (1960) 
Goldstein (1960) 


Goldstein & Shipman 
(1961) 

Lipkin (1954) 

Uhlenhuth & Duncan 
(1968) 


Apfelbaum (1998) 
Gliedman et al. (1957) 


Goodman (1960) 
Rosenbaum et al. (1956) 
Mindess (1953) 
Hoehn-Saric, Frank, Im- 


ber, Nash, Stone, & 
Battle (1964) 


Schroeder (1960) 
Blau (1950) 


Brady, Zeller, & Rezni- 
koff 


Lipkin (1954) 


Raskin (1949) 


Roth et al. (1964) 
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6. Ability and Adequacy of Intellectual Functioning 
a. IQ estimates 


Four Wechsler Adult Intelligence Scale (WAIS) subtests. 
(.46 with improvement). 

Most successful of a group of 100 VA outpatients scored 
higher on the Army Alpha. 

Four WAIS subtests (.36 with TAT adequacy “residual 
gain," not with improvement).* 

IQ as measured by four subscales of the Wechsler-Bellevue Harris & Christiansen 
Intelligence Scale was unrelated to outcome. (1946) 


Barron (1953a) 
Casner (1950) 


Fiske et al. (1964) 


e + + + 


“Definitely better” outcome group had higher (p < .01) + Miles et al. (1951) 
Wechsler-Bellevue IQs than the “unchanged” group. 
Intelligence (standard intelligence test). + Zigler & Phillips (1961) 
Intelligence.* 0 Rosenbaum et al. (1956) 
Wechsler-Bellevue full-scale IQs related to outcome; lower + Rioch & Lubin (1959) 
scores associated with failure. But adequate performance 
was not discriminating. 
Full-scale WAIS (improved versus unimproved, p < .01).* + Rosenberg (1954) 
b. General estimates of intellectual skills 
Gough MMPI scale of intellectual efficiency.* + Sullivan et al. (1958) 
Vocabulary and word fluency (with criteria at 3-year + McNair et al. (1964) 


status of symptom reduction .42; improvement rating .24; 
self-rating .29).* 

General ability tests (including cube test, Stroop ratio, 0 Seeman (1962) 
concealed figures, autokinetic effect, flicker-fusion, mirror 
test) unrelated to outcome. 

Abstract reasoning and other cognitive and perceptual- + Barry & Fulkerson 
tests correlated (about .30) with outcome. (1966) 


7. Patient " Likability"* 


"Therapist liking for P correlated .29 with success in + Strupp et al. (1963) 
treatment.* 
Therapist liking for P (by T's attitude inventory). 0 Gottschalk et al. (1967) 


8. Attractiveness, Suitability, or Prognosis in Psychotherapy 


Rating of “attractiveness” (by independent interviewer) or + Nash et al. (1965) 
suitability for psychotherapy based on age, education, 
occupation, ability to relate, verbal facility, etc. These 
seem to comprise social class status and P's achievements). 

Prognosis correlated .48 (p < .01) with “‘success.”"* + Strupp et al. (1963) 


9. Affect 
a. Anxiety 


Anxiety rated from tape-recorded speech sample.* 


: + Gottschalk et al. (1967) 
Anxiety as a presenting symptom (according to T); more + Hamburg et al. (1967) 
of such Ps who complete treatment improve in character 
change than would be expected by chance.* T 
Anxiety (measured by MMPI) unrelated to outcome.* 0 Bergin & Jasper (1969) 
Initial anxiety level correlated .66 for Ps whose initial + Luborsky (1962) 
health-sickness was 50 or greater; .11 for ratings below s 
50” 


5 See also Section ID: Patient Factors as Judged from the Treatment. 
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Initial anxiety level unrelated to outcome.* 

Disturbance in impulse life, external causation of discomfort, 
andanticipating punishment from external sources 
(anxiety?) are greater in failure than in success groups. 

Taylor Manifest Anxiety related to outcome for schizo- 
phrenic women; 
unrelated for schizophrenic men. 


Taylor Manifest Anxiety related to outcome ratings (multi- 


criteria). 
Taylor Manifest Anxiety unrelated to outcome (232 VA 


outpatients).* 


b. Other affects 


Absence of flattening of affect, or emotional blunting 
(rated from case history; mainly schizophrenic Ps). 

Ps who began treatment with mobilized negative affect im- 
proved more (anxiety, fear, hostility, depression).* 

MMPI— Depression Scale.* 

MMPI-— Depression Scale. 


Gottschalk Hostility-Inward scale.* 


c. Number of complaints (amount of discomfort) 
Higher scores on personal discomfort scale (symptom 
checklist) related to greater success of treatment 


(p < .05).* 
Number of complaints (discomfort scale) related to improve- 


ment (.67).* 


10. Authoritarianism 


California F-scale unrelated to outcome (232 VA out- 
patients).* 


11. Ethnocentrism 
Ethnocentrism scale (-.64 with improvement); (with in- 
telligence partialled out, the r remains significant: —.34). 
Ethnocentrism scale (as E-scores increase beyond the group 
mean, failure cases increase).* 
Ethnocentrism scale.* 


12. Human Relations Interest 
Interest in human relations (a scale applied to speech 


samples).* 
Number of positive indicators of object relatedness 
measured by object relations technique related (p < .01) 


to outcome.* 
Relatability (i.e., quality of object relationships measured 


by TAT) related to improvement (judges' ratings). 
Ability to develop interpersonal relations.* 
13. Coping or Defensive Style 
Negative and demanding attitude associated with a lack of 


g * 
success. 
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Roth et al. (1964) 

Kirtner & Cartwright 
(1958a) 

Distler et al. (1964) for 
women 

Distler et al. (1964) for 
men 

Gallagher (1954) 


Katz et al. (1958) 


Astrup & Noreik (1966) 
Conrad (1952) 


Gallagher (1954) 

Uhlenhuth & Duncan 
(1968) 

Gottschalk et al. (1967) 


Stone et al. (1961) 


Truax et al. (1966) 


Katz et al. (1958) 


Barron (1953b) 
Tougas (1954) 


Rosen (1954) 


Gottschalk et al. (1967) 
Rayner (1964) 
Isaacs & Haggard (1966) 


Rosenbaum et al. (1956) 


Conrad (1952) 
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Defensiveness unrelated to outcome for 10 Ps in client- 
centered psychotherapy.* 

Defensiveness correlated —.38 with success (5 < .01).* 

Defensiveness related negatively to improvement. 

Immediacy with which one handles feelings-in-relationship 
problems differentiated the outcome groups.* 

Less rigidity (in rated improved versus unimproved, 
p < .01).* 

Less stereotypy (in rated improved versus unimproved, 
pi< .01).* 


14. Somatic Concern 


Health concern (in rated improved versus unimproved, 
p < .05).* 

Somatic and psychological complaints during the pre- 
treatment period.* 


15. Self-Awareness, Insight, and Sensitivity 


Ps who could verbalize feelings and use intellectual controls 
improved more.* 

Initial understanding or insight unrelated to outcome in 
10 Ps in nondirective psychotherapy.* 

Sensitivity (rated in improved versus unimproved).* 

Ps with more insight showed more improvement.* 

Insight.* 


16. Miscellaneous Test Findings 
MMPI General 


Mf scale score (Feminine attitudes ?) characterized the most 
improved in group of 100 VA outpatients. 

F scale (Test-taking attitude) negatively correlated (—.34; 
p < .05) with outcome (judge's rating). 

K scale (Test-taking attitude) positively correlated (.39, 

p < .05) with outcome. 

Pd-, Pa-, Sc-, and Ma-differentiated outcome groups' over- 
all ratings by most experienced judges were related to 
outcome (p < .02). 

F, D, Pd, Pt, Sc, and Feldman Prognostic Scale were 
related to improvement (p < .05).* 


Figure-Drawing Test 


Primitivity versus sophistication in drawing of head: 12 of 
19 unimproved Ps predicted by primitive drawing; only 
one who improved would have been falsely predicted 
(.03 level) ; sophisticated drawing of head may reflect 


greater positive relationships with people, or higher 
intelligence. 


General Findings on Psychological Tests 


Difference in ascending and descending critical flicker- 
fusion measures correlated with outcome (multiple 
ratings). 

Psychiatric Attitudes test related to improvement.* 


+ + +1 
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Raskin (1949) 

Strupp et al. (1963) 

Zolik & Hollon (1960) 

Kirtner & Cartwright 
(1958b) 

Rosenberg (1954) 


Rosenberg (1954) 


Rosenberg (1954) 


Stone et al. (1961) 


Conrad (1952) 
Raskin (1949) 
Rosenberg (1954) 


Zolik & Hollon (1960) 
Rosenbaum et al. (1956) 


Casner (1950) 

Endicott & Endicott 
(1964) 

Endicott & Endicott 
(1964) 

Harris & Christiansen 


(1946) 


Hunt et al. (1959) 


Fiedler & Siegel (1949) 


Barry (1962) 


Brady et al. (1959) 
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Cultural as opposed to occupationally oriented interests 
(measured by Kuder Preference Exam) characterized 
most improved of 100 VA outpatients. 

Gain scores in 9 perceptual-motor and conceptual tests 
failed to relate to improvement.* 

"Paranoid Schizothymia" and ‘Hysterical Unconcerned” 
scales of Cattell's 16 Personality Factor Scale related 
(5 < .01) to improvement.* 

Nonparticipation, passivity versus responsibility, per- 
sistence of characters in TAT stories (p < .01).* 

Behavioral Disturbance scale unrelated to improvement.* 


0 


B. Demographic or Life-Situation Factors 


. Age 


Age unrelated to outcome.* 

Age unrelated to outcome. 

Ps under 30 years showed more improvement than did 
older Ps. 

The 25 most improved Ps (in sample of VA outpatients) 
were about 5 years older than the least improved Ps 
(mean 32 versus 27).* 

Age unrelated to outcome (also Q-sort).* 

Age unrelated to improvement in brief treatment.* 

The 46+ age group improved less.* 

For 27 analytic Ps, correlation of .47 (p < .05) between 
age and outcome, with older Ps improving more (but 
note narrow age range of 20-40, mean 27). 

Age unrelated to outcome (23 Ps).* 

Ps with successful outcome ratings tended to be younger 
(H A .05).* 

Age related to outcome. 


. Sex 


P's sex unrelated to outcome. 

P's sex unrelated to outcome.* 

P's sex unrelated to outcome.* 

P's sex unrelated to outcome (27 analytic Ps). 

Process schizophrenic Ps' sex unrelated to outcome.* 

Females did significantly better than males (criterion: rating 
of success and satisfaction).* 

Females did significantly better than males. 


- Social Achievements 


a. Socioeconomic level (social class) 


Hollingshead two-factor index of social position.* 

Lower socioeconomic status (H-R scale Groups 4 and 5) 
associated with greater improvement in brief psycho- 
therapy.* 

Occupation level and annual earnings uncorrelated with 
improvement (232 VA outpatients).* 

Higher social class correlated with better outcome 
(criteria at 3-years status: self-rating .23; symptom 
reduction .37). 

High social status and financial success related to 


: Lok 
improvement. 


Teo 


Te E oo | 


+ 


+ +ocsoc 


169 


Casner (1950) 


Fulkerson & Barry 
(1966) 
Hunt et al. (1959) 


Rayner & Hahn (1964) 


Katz et al. (1958) 


Bloom (1956) 
D. S. Cartwright (1955) 
Casner (1950) 


Conrad (1952) 
Gaylin (1966) 
Gottschalk et al. (1967) 


Hamburg et al. (1967) 
Knapp et al. (1960) 


Seeman (1954) 
Stone et al. (1961) 


Zigler & Phillips (1961) 


D. S. Cartwright (1955) 
Gaylin (1966) 

Hamburg et al. (1967) 
Knapp et al. (1960) 
May (1968) 

Mintz et al. (1970) 


Seeman (1954) 


Brill & Storrow (1960) 
Gottschalk et al. (1967) 


Katz et al. (1958) 


McNair et al. (1964) 


Rosenbaum et al. (1956) 
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b. Occupational adjustment 


Unemployment associated with lack of improvement, but 
occupational maladjustment characterized most improved 
of fully employed group. 

Most improved of 50 VA outpatients had been employed at 
beginning treatment and had better occupational ratings. 

Prior occupational adjustment. 

Work adjustment.* 

Better jobs.* 

Initial employment status. 


c. Marital or sexual adjustment 


Marital status unrelated to improvement in brief psycho- 
therapy.* 

No difference in prior marital adjustment of the three out- 
come groups. 

Marital adjustment.* 

Ps with successful outcome had fewer signs of sexual 
maladjustment before treatment (p < .05).* 

Marital status.* 


d. Education 


Education.* 

Ps with 12 or more years of education improved more than 
less educated Ps. 

Higher education level associated with more favorable 
outcome.* 

Education unrelated to outcome for 27 analytic Ps. 

Higher educational achievement.* 

Education.* 

Higher educational achievement.* 


e. Overall social competence 


Socially ineffective Ps improved more.* 


Early Home Situation 


Nonsignificant trend: Ps in the two most improved groups. 
Ps who had more favorable early childhood environment 
were more likely to be seen as improved.* 


. Student Status 


Students were more successful than nonstudents. 
Fulltime college students improved more than nonstudents. 
Students versus nonstudents. 

Students versus nonstudents.* 


. Previous Psychotherapy 


Previous treatment made no significant difference in 
outcome.* 

Presence, amount, duration of previous treatment. 

Previous therapy unrelated to improvement.* 
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Conrad (1952) 


Miles et al. (1951) 
Rosenbaum et al. (1956) 
Sullivan et al. (1958) l 
Tolman & Mayer (1957) 


Gottschalk et al. (1967) - 


Miles et al. (1951) 
Rosenbaum et al. (1956) 
Stone et al. (1961) 
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Tolman & Mayer (1957) 


Bloom (1956) 
Casner (1950) 


Sullivan et al. (1958) 
Stone et al. (1961) 


Miles et al. (1951) 


P 
Hamburg et al. (1967) 

Knapp et al. (1960) 

McNair et al. (1964) 
Rosenbaum et al. (1956) 
Rosenbaum et al. (1956) | 


D. S. Cartwright (1955) 

Casner (1950) 

Gaylin (1966) 

Rogers & Dymond 
(1954) 


Hamburg et al. (1967) 


Klein (1960) 
McNair et al. (1964) 
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7. General Demographic 
Demographic factors (unspecified) 


Ps with professional occupations and Ps who are psy- 
chiatrists or analytic candidates are more likely to com- 
plete treatment than general population.* 

Religious activities associated with lack of change.* 


C. Physiological Factors 


Low blood flow in calf of leg: hospitalized Ps with subsequent 
clinical improvement (19 out of 20) had lower blood flow 
response to reassurance from psychotherapist (p < .05), 
as compared with reassurance from the psychiatrist. 

Low blood flow in calf of leg: only physiological variables 
which correlated with changes in symptom list before 
and after psychotherapy (p < .01) (in 30 nonpsychotic 
Ps) 
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Bailey, Warshaw, & 
Eichler (1959) 
Hamburg et al. (1967) 


Rosenbaum et al. (1956) 


Clancy & Vanderhoof 
(1963) 


Vanderhoof & Clancy 
(1964) 


D. Patient Factors as Judged from the Treatment 


1. Likability 
More successful Ps were more liked than the less successful 
Ps (p < .05); (10 raters of 2-minute tape"segments). 
Likability predicted percentage of time out of hospital and 
global rating of changes in the test battery. (Ratings by 

12 psychiatric residents on 28 schizophrenic Ps.) 


2. Problem-Solving Attitudes 


How P conceptualizes (Kirtner typology) and attempts to 
resolve his problems (with residual gain in score by 
interviewer-diagnostician .39).* 

How P conceptualizes and attempts to resolve his 
problems* 

Ps who showed early increases in their reported positive 
actions toward self and others and increased in their 
positive evaluation of others were rated as more improved 
at termination.* 


3. Experiencing (and “Process Scale") 


Successful P is one who moves from reporting his feelings to 
expressing them directly.* 

Ps improve most who start at a high level of process.* 

Experiencing based on ratings of initial in-therapy behavior 
differentiated the outcome groups.* 

Process scale distinguished most successful from least 
successful cases (p < .05) and indicates that more success- 
ful cases begin and end at higher levels of process.* 

Process change, especially “personal constructs" and “re- 
lationship" : greater in six more successful than in six less 
successful Ps.* 

Process scale correlated (.89) with outcome. 


4. Other Patient Factors 


"Complexity of language" (2): The combination of number 
of syllables per word used by P and T with 7's 
success ratings (.42) and Q-adjustment change (.39) 
(from 10-minute taped samples of first two interviews;* 
could also be listed under T factors). 
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Stoler (1963) 


Stoler (1966) 


Fiske et al. (1964) 


Kirtner & Cartwright 
(1958b) 
Rosenman (1955) 


Gendlin et al. (1960) 


Gendlin et al. (1968) 
Kirtner et al. (1961) 
Tomlinson & Hart (1962) 
Tomlinson (1967) 


Walker et al. (1960) 


Barrington (1961) 
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Increased maturity of behavior (reported during psycho- + Hoffman (1949) 


therapy) is greater among more successful than less 
successful cases. 


Cep 
Patient health (rated from 5-minute typed samples) related + Mintz et al. (1970) fi 
to criterion judgment of success and satisfaction.** 
Discomfort-Relief quotient from casework records (with 0  Mowrer et al. (1953) 


criterion measures from social caseworker's judgment 
of movement).* 

Discomfort- Relief quotient in several studies based on + Mowrer et al. (1953) 
psychotherapeutic and counseling records showed high rs 
with T estimate of success and other measures of change— 
the quotient moved toward greater comfort.* 

Mean type-token ratio (ratio of number of different words + Roshal (1953) 
to total words) increased for the more successful versus 
the less successful group— that is, the ratio moved toward 
greater diversity of words.* 


II. THERAPIST Factors INFLUENCING OUTCOME OF PSYCHOTHERAPY 


A. Therapist Qualities 
1. Training and Experience 


Patients treated by more expert therapists (versus less +  Barrett-Lennard (1962) 
expert therapists) gave higher scores to their Ts on rating 
level of regard of T, empathic understanding, and congru- 
ence. (Level of these variables related to change, and 
treatment by experts was longer.)* 
Ps of experienced Ts showed more improvement. + R. D. Cartwright & 
Vogel (1960) 
Experienced Ts achieved improvement (and empathy) with + R. D. Cartwright & 


same-sex Ps with whom psychological distance was im- Lerner (1963) 
mediately reduced. 

Inexperienced Ts achieved improvement (and empathy) —? R. D. Cartwright & 
with opposite-sex Ps with whom distance was im- Lerner (1963) 


mediately increased. 


Experience (square root of number of Ps treated by T).* O0 Fiske et al. (1964) 
Degree of counselor experience (PhD, experienced trainee 0 Grigg (1961) 
with 1 year internship, versus inexperienced trainees) 
unrelated to P reports of outcome. 
T's years of experience related to outcome (p < 301), + Katz et al. (1958) 
Outcome less favorable in first than in subsequent analytic + Knapp et al. (1960) RU 
control Ps. i 
Degree of T's experience. + Miles etal. (1951) 
Nonsignificant trend for Ps of most experienced Ts to 0  Mindess (1953) 
demonstrate greatest improvement. 
Experienced staff versus residents (p < .05).* + Myers & Auld (1955) 
Experienced Ts were more often characterized by a + Rice (1965) 
particular style of treatment participation.* 
Experience.* O0 Sullivan et al. (1958) 
2. Personal Analysis 
Personal analysis of T unrelated to P outcome.* 0 Katz et al. (1958) 4) 
Months of personal analysis of T.* 0 McNair et al. (196 C 


Degree ol 
Absence of 


* This finding is consistent with findings on degrees of i al disturbance; see Section rm 
Initial Disturbance versus Adequacy of Functioning; and Section IA:2: Diagnosis (Especially 
Psychotic Trends). 
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3. Skill 


Nonsignificant differences in patient outcome for 3 Ts, each 0 Imber et al. (1957) 
with 18 Ps (with improvement in social competence). 

Outcome in 103 outpatients unrelated to different Ts' skills.* 

Skill related to two change factors rated at termination of 
therapy by T and P.* 

Ts with best therapeutic behavior Gudged from taped 
sessions) had best outcomes.* 

63% of Ps treated by psychoanalytic students of superior 
skills showed substantial change, with only 39% for the 
above average and 28% for the below average in psycho- 
analysis. 


Muench (1965) 
Nichols & Beck (1960) 


Nash et al. (1965) 


+ + +o 


Klein (1965) 


4. Expectations 


T's initial assessment of P's ranking of the importance of 0 Parloff, Iflund, & Gold- 
various topics for psychotherapy was not significantly stein (1958) 
different for one improved and one unimproved P.* 


5. Interest Patterns (from Strong Vocational Interest Blank) 


Type A therapists (active personality-oriented problem- + Betz & Whitehorn (1956) 
solving approach, more expressive) were more effective 
than Type B therapists (regulative, mechanical interests) 
with schizophrenic Ps.* 

Type A therapists were more effective in treating schizo- + Betz (1963) 

" phrenic Ps than were Type B therapists.* 

Type A versus Type B: Patients of Betz Type B therapists — McNair et al. (1962) 
improved more than patients of Type A therapists. (Re- 
sults differ from Betz and Whitehorn's, but were based 
upon different kinds of Ps.)* 

Type A versus Type B: No relationship between treatment 0 Stephens & Astrup 
by Aor B therapists and discharge status. Results were (1963, 1965) 
dependent on clinical status of P when he came for treat- 
ment, not type of therapist).* 


Type A therapists were more effective in working with + Whitehorn & Betz 
^ schizophrenic Ps than were Type B therapists.* (1954, 1957) 
Interests similar to successful psychiatrists (Strong Voca- + Uhlenhuth & Duncan 
tional Interest Blank) (1968) 


6. Attitude Toward Treatment 


Therapist's attitude toward treatment of process schizo- 0 May (1968) 
phrenic Ps.* 


T's interest in psychotherapy.* : + McNair et al. (1964) 
Favorableness of T's attitude toward psychiatry and + Goldstein & Shipman 
treatment related to P's symptom reduction, (1961) 
h - 
j 7. General Qualities 
Personality characteristics of counselors rated by peers.* 0 Aronson (1953) 
Ability to enter the phenomenological field of another, 0 Ashby, Ford, Guerney, 
sympathetic interest, acceptance of the value system of & Guerney (1957) 


others, social stimulus value to associates, need to 

aggrandize self, and aggressiveness unrelated to changes 

in adjustment (Maladjustment Index ?). 

Depression (judged from MMPTI).* 0 Bergin & Jasper (1969) 
Anxiety (judged from MMPI).* 0 Bergin & Jasper (1969) 
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B. Demographic Variables of Therapist 


Sex of therapist.* 0 Sullivan et al. (1958) 

Professional discipline (psychiatrist, psychologist, social 0 Sullivan et al. (1958) 
worker).* 

Most successful therapists were born to unusually young + Uhlenhuth & Duncan 
or, especially, to unusually old mothers. (1968) 


C. Therapist Factors as Judged from the Treatment 


. Empathy 
a. Empathy as judged from sessions (process measures) 
Accurate Empathy (from tape segments) unrelated to 0 Bergin & Jasper (1969) 
outcome.* 
Accurate Empathy related to a decrease in the MMPI Sc + Rogers et al. (1967) 
(schizophrenia) scale, but unrelated to other measures 0 Rogers et al. (1967) 
of change (Ward Behavior rating).* 
Accurate Empathy (from tape segments). + Truax (1963) S 
Accurate Empathy (from tape segments).* +  Truax et al. (1966) 
T Empathy (judged from tape segments and tapes of 0 Mintz et al. (1970) 
entire sessions).* 
b. Empathy as judged in other ways Y 
Relationship Inventory variables: T's regard for P (em- + Barrett-Lennard (1962) 
pathic understanding and congruence) rated by Ps after 
five sessions predicted change measures by T (p < .001) 
for low-change versus high-change Ps; the same findings 
were obtained when rated by T, though not as strongly as 
as when rated by P).* 
Empathy (definition: at close of treatment Ts of improved + R.D. Cartwright & 
Ps agreed more (p < .02) with P's self-image). Lerner (1963) 
T's initial ability to understand P's pretherapy self-image. Q RD: a & 
P ; Lerner (1963 
Empathic understanding (Barrett-Lennard Inventory) + Feitel (1968) 
related to improvement (r — .34).* 
Empathic understanding scale with counseling progress O0 Lesser (1961) 
(Q-sort criterion). i I 
Refined empathy score of Bender and Hastorf (with -- Lesser (1961) x 
counseling progress). 
Unconditional Positive Regard (UPR) 
a. UPR judged from sessions 
Unconditional positive regard. + Truax (1963) k 
b. UPR judged by Barrett-Lennard Inventory 
Unconditional positive regard.* + Barrett-Lennard (1962) g 
Unconditional positive regard.* + Feitel (1968) 
. Genuineness í f 
Self-congruence (genuineness). Taped samples of Ts’ im- + Truax (1963) 
proved Ps rated higher (p < .05). j 
Genuineness (taped samples).* + Truax et al. (1966) / 
. Nonpossessive Warmth d 
Nonpossessive warmth (with global improvement rated —  Truax et al. (1966) 


by P) (p < .05).* 1 
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5. Other Therapist Factors and Combinations Judged from Treatment 


Assumed similarity: same-sexed Ps who improved were ? 
early accepted by T as like himself. 

P self-rating of feeling understood by T related to improve- 
ment (r = .59).* 

P rating of T on Regard plus Understanding (Barrett- 
Lennard Inventory) with improvement (r = .77).* 

High T empathy in context of low T directiveness, or low 
T empathy in context of high T directiveness, related to 
outcome (judgment of P's success and satisfaction).* 

Type II T behavior (taped sample from second session) 
correlated with 7's (—.45) and P's (—.49) outcome 
ratings. (Type II — distorted voice quality; language not 
fresh or connotative; mainly T joins in self-observing the 
P; not exploring of inner experience.)* 

Type III T behavior (expressive style, freshness of word 
combinations, etc.) related to success (5 « .05) in client- 
centered therapy. 

Accurate Empathy-Warmth-Genuineness combined. (Ts 
providing high conditions had 90% improvement versus 
Ts with lower conditions had only 50% improvement.)* 
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R. D. Cartwright & 
Lerner (1963) 
Feitel (1968) 


Feitel (1968) 


Mintz et al. (1970) 


Rice (1965) 


Rice (1965) 


Truax et al. (1966) 


III. Tug MarcH BETWEEN PATIENT AND THERAPIST 


E A. Similarity 
. MMPI Similarity 


Similarity in MMPI profile shape between P and T: 
Relationship curvilinear with either extreme similarity or 
dissimilarity associated with lower success ratings. 

Similarity in MMPI profile shape between P and T.* 


= (i) 


E 


Similarity in MMPI profile shape between P and T un- 
related to outcome (an attempt to replicate work of 
Carson & Heine, 1962).* 


. Rorschach Similarity 


Ps rated as more successful showed more pre- to post- 
treatment shifts in Rorschach performance in direction 
of T's Rorschach than did less successful Ps.* 

After treatment, Ps showed changes in Rorschach M:C 
ratio in direction of their T. (Ps whose T had an M:C 
ratio of 2:1 or better gave more C after treatment.) 


. Interest and Values Tests Measuring P-T Similarity 


Correct awareness of similarity (.45 with counseling 
progress). . 
P-T compatibility: Fundamental Interpersonal Relations 
Orientation Behavior (FIRO-B) .45 with Supervisor's 

rating of patient improvement.* 

P-T congruence score (on appropriate therapeutic 
techniques) related to T improvement (.38), P improve- 
ment (.33), but not changes in P target complaint 
severity.* 

Value similarity (Strong Vocational Interest Blank and 
Ways to Live scale). Degree of P-T value similarity 
related (p < .05) to improvement.* 


Carson & Heine (1962) 


Carson & Llewellyn 
(1966) 
Lichtenstein (1966) 


Sheehan (1953) 


Graham (1960) 


Lesser (1961) 


Sapolsky (1965) 


Schonfield et al. (1969) 


Welkowitz et al. (1967) 
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4. Social Class Similarity 


Social class similarities of T and P.* 


- Other Similarities 


wy 


Ps who improved most were initially (in first session) more 
similar to their T on Kelly’s Role Construct Repertory 
Test (p < .05); and changed toward T's ideal. 

General P-T similarity (with success).* 

Similarity between counselor's description of P and ideal 
self was related to outcome in only two of six cases.* 

Similarity between P and T self-perceptions was negatively 
related to progress in treatment (Q-sorts?). 


B. Quality of Relationship 


The more favorable the quality of the P-T relationship, the 
more favorable the outcome on 3 of 14 scales measuring 
comfort, effectiveness, and objectivity.* 

Five measures of interview relationship labeled Rapport, 
Blocking, Hostile Resistance, Dependency, and Control- 
ling Resistance were unrelated to outcome.* 


IV. TREATMENT FACTORS 
A. Type of Treatment 


Leading therapy resulted in higher improvement ratings 
than did reflective therapy (5 « .01).* 

Leading versus reflective therapy made no difference in 
outcome.* 

Individual and group therapy were more effective in com- 
bination than was either employed separately ; individual 
therapy employed singly was more effective than group 
therapy. 

Ps in both individual and group therapy improved more 
than Ps in individual therapy.* 

Interaction versus insight therapy: Interaction group im- 
proved more.* 

Desensitization was more effective than psychotherapy 
in decreasing simple phobic responses. 

Desensitization was not more effective than psychotherapy in 
Ps with more complex Symptoms or severe anxiety or 
depression. 

Ps in group or individual psychoanalytically oriented 
therapy were no different from Ps in carbon dioxide 
therapy, in rated outcome. Ps in individual or group 
therapy gave more favorable subjective reports. 

Ps in individual psychotherapy improved more than Ps in 
minimal contact (4 hour per 2 weeks) (p < .05) at end of 
6 months (in social effectiveness). 

Group psychotherapy Ps improved more than Ps with 
minimal contact psychotherapy (p < .01) at end of 
6 months (in social effectiveness). . 

Group versus individual psychotherapy: No difference in 
outcome (in either social effectiveness or discomfort). 

Desensitization procedure was more effective than insight- 
oriented or placebo procedure (treatment analogue). 
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Hollingshead & Redlich 
(1958) 


Landfield & Nawas 
(1964) 


Tuma & Gustad (1957 
Hunt et al. (1959) 


Lesser (1961) 


Parloff (1961) 


Roth et al. (1964) 


Ashby et al. (1957) 
Baker (1960) 

Baehr (1954) - 

Conrad (1952) 

Coons (1957) 

Gelder, Marks, & Wolff 


(1967) 
Gelder et al. (1967) 


Harris (1954) 


Imber et al. (1957; also 
Frank et al., 1959) 


Imber et al. (1957; also 
Frank et al., 1959) 


; also 
Imber et al. uo als 
Frank et al., 1959 


Paul (1967) 
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Ps treated by either brief or intensive group therapy showed 
more reduction in California Ethnocentrism Scale (but no 
change in F Scale) than Ps treated by individual psycho- 
therapy. 

Ps receiving adjunctive group therapy in addition to in- 
dividual therapy were more likely to be rated as im- 
proved at termination (66% versus 51%). 

No difference between the three treatments described in 
Imber et al. (1957) at 5 years: individual psychotherapy, 
group psychotherapy, “minimal” psychotherapy.* 


Pharmacological agents versus psychothera by 


Follow-up adjustment unrelated to receiving tranquilizer 
drugs during treatment (i.e., results of psychotherapy with 
and versus without drugs were not different).* 

Ps receiving tranquilizers, phenobarbital, or placebos in 
conjunction with psychotherapy were not different after 
8 weeks of treatment from Ps receiving only psycho- 
therapy.* 

Psychotherapy with drugs and drugs alone were more 
effective than psychotherapy alone, electroconvulsive 
therapy, and milieu, in treating schizophrenia.* 

Nonsignificant trend observed (staff ratings) for Ps re- 
ceiving psychotherapy and insulin-coma therapy to im- 
prove more than Ps receiving only insulin-coma therapy.* 

Ps receiving psychotherapy and insulin-coma 
therapy improved more (psychological test differences) 
than Ps receiving only insulin-coma therapy.* 

Psychotherapy alone, during initial 4 weeks of 7-month 
psychotherapy (rather than medication) yielded higher 
outcomes than for Ps given no treatment.* 

Placebo (i.e., ineffectual help) in initial 4 weeks of 7-month 
psychotherapy reduced the gain.* 

Psychotherapy in combination with Librium did not produce 
significantly more improvement than Librium alone.* 

Ps of Type B therapists did better with insulin in addition 
to psychotherapy.* 


B. School of Treatment 


Psychoanalytic versus client-centered therapy: no difference 
in degree of experiencing and level of self-observation. 

Psychoanalysis versus psychotherapy performed by 
psychoanalysts: Psychoanalysis produced more symptom 
change and character structure change than psycho- 
therapy.* 

Ps in psychoanalytic, nondirective, and Adlerian psycho- 
therapy reported no significant difference in amount 
of change.* 

No significant difference between Adlerian and client- 
centered approaches. 

Sullivanian compared to Rogerian treatment methods.* 

Greater improvement occurred for patients with higher 
"relatability" (quality of object relations) for both client- 
centered and psychoanalytically oriented psychotherapy.* 

Psychoanalytically oriented therapy was less successful than 
“rational emotive therapy” (by one therapist employing 
two methods over time).* 
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Pearl (1955) 


Peck (1949) 


Stone et al. (1961) 


Fairweather et al. (1960) 


Lorr et al. (1961) 


May (1968) 


Roos (1961) 


Roos (1961) 


Roth et al. (1964) 


Roth et al. (1964) 
Roth et al. (1964) 


Whitehorn & Betz 
(1957) 

R. D. Cartwright (1966) 

Hamburg et al. (1967) 

Heine (1953) 

Shlien et al. (1962) 

Tougas (1954) 


Isaacs & Haggard (1966) 


Ellis (1957) 
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C. Time-Limited versus Unlimited Treatment 


Compared to Ps in long unlimited treatment, Ps in brief 
time-limited treatment showed a marked decline in affect 


Henry & Shlien (1958) 


differentiation (on TAT), but no difference on Therapist 0 Henry & Shlien (1958) 
rating, Behavioral Index, and Q-sort.* 

Time-limited and short-term groups improved more than + Muench (1965) 
long-term groups (on Rotter Test and Maslow Security- 

Insecurity Inventory).* 

Time-limited client-centered treatment compared favorably 0 Shlien (1957) 
with longer, unlimited treatment, on most outcome 
measures.* 

Time-limited treatment (20 Sessions) versus unlimited O0 Shlien et al. (1962) 
treatment (median: 37 sessions). 

70% of Ps treated for 6 months versus 74% who dropped 0 Frank et al. (1959) 
out in the first month showed a decrease in discomfort. | 

"Ideal" long-term treatment, brief supportive treatment, 0 Pascal & Zax (1956) | 
and environmental manipulation produced a high but 

not a different level of change. 
D. Number of Sessions and Duration of Treatment 

Strong relationship between duration and improvement.* + Bailey et al. (1959) 

Number of sessions related to improvement, + Bartlett (1950) 

Number of sessions related to success ratings (no data on + D.S. Cartwright (1955) 
treatment length or on frequency.) 

Log number of session rs with T's rating of movement on + D. $, Cartwright, Rob- p 
personal integration (r — .36). Success rating, .49* ertson, Fiske, & Kirtner 
(p < .001). (1961) 

The most improved Ps continued in treatment the longest + Conrad (1952) 

(of 50 VA outpatients in short-term treatment).* 

Length of treatment (6 to 10, versus 21+ sessions) un- 0 Errera, McKee, Smith, 
related to outcome* (also judges' ratings). & Gruber (1967) 

Length (log weeks, log interviews); with success rating*: + Fiske et al. (1964) 
interview diagnosticians’ perception of change, from 
listening to first and last interviews (.36). 

Hours of treatment related (b < .01) to outcome.* + Getter & Sundland 

(1962 

Number of sessions related to improvement (self-rating; + e (1958) 
adult neurotics). 

Number of sessions related to improvement (adult 0 Graham (1958) 
psychotics). a 

Duration of treatment more influential than sheer number +  Lorr (summary of 
of sessions.” literature, 1962) 

Number of sessions related to outcome (at 1-year follow-up)* + Lorr, McNair, Michaux, 

& Raskin (1962) 

Number of sessions related to social and Psychological change, + MeNair et al. (1964) 
symptom reduction, and greater insight.* xh 

Unimproved Ps tended to have fewer treatment hours than + Mensh & Golden (1951) | 
improved Ps.* 

Curvilinear relationship with definitely better group re- + Miles et al. (1951) » 
ceiving more treatment sessions than either the markedly | 
improved or the unchanged group. . - j 

Number of sessions (a minimum number required for im- + Myers & Auld (1955) } i 
provement (p < .001).* " Ed 

Number of sessions related to improvement.* + Nichols & Beck (1960) * 

Number of sessions (trend) ; more than 20 sessions.* + Seeman (1954) 


^ 
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Log number of sessions relative to 7's rating of movement 
in personal integration. Change in level of personal 


integration related to case length.* 
Length of treatment associated with success.* 


Number of sessions related to improvement. (No data on 
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treatment length or frequency.) Outcome was related to 
length of treatment, with exception of a failure zone 


between the thirteenth and twenty-first interviews.* 
Length of treatment was related (p < .01) to outcome. 


E. Frequency (Rate) of Sessions 


Adult neurotics improved more (self-rating) when seen twice 


weekly versus once weekly. 


Adult psychotics got worse on a twice-weekly schedule. 
Frequency (sessions per week) at 4-, 8-, and 12-month 


+ Standal & van der Veen 
(1957) 

+ Sullivan (1958) 

+ Taylor (1956) 

+ Tolman & Mayer (1957) 

+ Graham (1958) 

— Graham (1958) 

0 Lorr et al. (1962) 


points unrelated to outcome* (also other outcome measures.) 


F. Waiting Interval Between Application and Starting Psychotherapy 


Waited for 60 days before psychotherapy negatively 


related to improvement.* 


Waited for 28 days before psychotherapy less likely to = 


improve.* 


Length of wait between evaluation and treatment nega- 


tively related to improvement. 


— Gordon & Cartwright 
(1954) 
Roth et al. (1964) 


— Uhlenhuth & Duncan 
(1968) 


(Received October 17, 1969) 
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In previous research, strength of preference for one gamble of a pair has been related 
to expected value difference (D) or to an index based on regret (R). In previous 


studies, however, D and R values have 
relations reported could have been due 


been strongly confounded. Therefore, the 
to D, to R, or to both. The present paper 


discusses the mathematical relationships between D and R, and gives a method of 
constructing sets of gamble pairs orthogonal in D and R. 


'The present paper concerns choice situations 
that can be characterized by the following 
payoff matrix : 


On each trial, the subject must choose one of 
two alternatives, a or &. There are two so- 
called “states of nature,” corresponding to the 
two columns. The left column obtains on a 
trial with probability P, and the right column 
obtains with probability 1 — P; a, b, c, and d 
are the possible payoffs (usually real or make- 
believe monetary amounts) to the subject; 
exactly one of these payoffs applies on a trial, 
depending on the choice and the state of nature 
for that trial. a and 8 can be called gambles, 
in that given the choice of either, the payoff 
depends on a chance outcome. 

A large amount of the experimental literature 
concerning such choice situations has appeared 
within the field of “probability learning.” The 
basic paradigm for probability learning re- 
quires the subject to predict for each of a 
series of trials which of two alternative prob- 
abilistic events will occur. In one variation, 
the subject receives a payoff on each trial 
depending on the correctness of his predic- 
tion. This variation is formalized by the above 
payoff matrix; œ and 8 represent the two 
possible predictions (choices). The “events” 
the subject must predict are the "states of 
nature." 

The typical subject does not choose con- 
sistently (i.e., identically) over a series of 


1 Requests for reprints should be sent to Wayne Lee, 
1516 Sonoma Avenue, Albany, California 94706. 


trials in probability learning experiments. Let 
P(a8) be the probability that the subject 
chooses a rather than 8. The magnitude of 
b(aB) might be said to indicate the "strength 
of preference” for gamble a over gamble 8. If 
b(a8) is close to 1, the strength is high, and 
it is less for smaller values of (ag). If plab) 
is less than .5, 8 is chosen more often than a. 
Under such circumstances, the term "strength 
of preference for a over 8” may seem unnatural, 
but we accept it for the sake of generality. 
Alternatively, one can speak of the "strength 
of preference for 8 over a,” p(8a) = 1 — p(af). 
Keep in mind that even if the “strength of 
preference" for a over £ is high, 8 is chosen 
over (preferred to) æ on some trials. In practice, 
p (aß) is estimated by the relative frequency of 
choosing e over a series of trials after the 
subject appears to have reached a fairly stable 
response probability. 

Alternatively, "strength of preference” 
might be measured by a direct rating by the 
subject. This method might require the sub- 
ject not only to choose a or B, but also to rate 
his strength of preference on a verbal scale 
such as “very strong preference,” “moderately 
Strong preference," etc. With the rating 
method, one need not present the subject with 
a long series of the same payoff matrix: one 
can change the matrix from trial to trial, 
Since a strength of preference measure can be 
obtained for a single trial. 

The exposition in this paper proceeds in 
terms of (a8), which is readily understood. 
Comparable results would appear to obtain, 
however, with the rating response. To under- 
score the generality, b(oB) is usually referred 
to as *preference Strength" rather than simply 
"probability." Although ratings and (a8) 


186 


————— — ————— ————— PwÓwnÓnp 


PREFERENCE STRENGTH 


may someday be shown to have some important 
differences as measures of preference strength, 
such differences are not apparent at present, 
so no distinctions are made between articles 
using one method or the other. 

Two indexes on the payoff matrix have been 
proposed to account for differences in strength 
of preference for one gamble over another with 
different payoff matrices. One is the difference 
in expected value (D) for the two gambles. 
The notion has been that the more two gambles 
differ in expected value, the greater will be 
the strength of preference for the higher- 
expected value gamble. As far as the present 
author is aware, this idea was first proposed 
by Mosteller and Nogee (1951); they sug- 
gested that choice probability increases mono- 
tonically with difference in expected utility 
between gambles, and that the gamble with 
the higher expected utility has the higher 
probability of being chosen. Mosteller and 
Nogee expressed their ideas in terms of utili- 
ties instead of an objective quantity, but 
subsequent authors have generally concerned 
themselves with expected values calculated 
directly from monetary payoffs. 

The other index is called here the expected 
regret ratio (R); Suydam (1965) called it the 
“expected loss ratio.” (R is described below.) 
The notion has been that the greater the R for 
a pair of gambles, the greater will be the 
preference strength. In other words, it has 
been hypothesized that R can predict pre- 
ference strength in the same manner as D, 
though the proposals have been made by 
different authors who typically have not con- 
sidered the alternate index. 

The R index can be traced to Edwards 
(1956); his index, relative expected loss, is 
linearly related to R (Suydam, 1965). Edwards 
found that the relative expected loss for a 
payoff matrix did an excellent job of predict- 
ing relative choice probabilities for different 
payoff matrices. (Predictions using R would 
be the same.) Edwards did not report what 
relationship held in his data between D and 
preference strength. 

Subsequent to the papers of Mosteller and 
Nogee (1951) and Edwards (1956) a number 
of studies have been published which have 
assessed the merits of D and R for predicting 
relative preference strength. The studies have 
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utilized objective payoff rather than utility 
for calculating D and R. Like the Mosteller 
and Nogee and Edwards papers, many of these 
papers dealt with D or R, with little or no 
analysis of the alternate index. Myers, Reilly, 
and Taub (1961) only considered D, whereas 
Taub and Myers (1961) only considered R. 
Both indexes were found to predict preference 
strength, with R appearing to do a better job. 
Suydam (1965) and Myers, Suydam, and 
Gambino (1965) compared both D and R on the 
same data sets. Again, both indexes were effec- 
tive, but R seemed superior. 

Both D and R, then, would appear to have 
some merit. However, in all previous studies 
the two indexes have been strongly confounded. 
Could it be that one index alone is effective, 
and that reports favoring the alternate index 
were due to confounding? It would appear that 
the R index, at least, is necessary to account 
for experimental results. Edwards (1956), Taub 
and Myers (1961), and Suydam (1965) re- 
ported correlations in the .90s between R and 
mean preference strength (across subjects). 
The relation between expected value dif- 
ference and preference strength was less im- 
pressive (Myers et al., 1961; Suydam, 1965). 
Furthermore, Myers and Sadler (1960) have 
reported a peculiar interaction that can be 
accounted for in terms of R, but not in terms of 
D, even if D is interpreted in terms of utility. 
On the other hand, immersed as psychology is 
in expected value theory and its variations, it 
is very difficult to believe that a subject’s pre- 
ference strength for a gamble does not increase 
as its expected value becomes greater and 
greater relative to the expected value of the 
alternative, regardless of whatever else is held 
constant. 

The purpose of the present paper is to pre- 
sent mathematical relationships between D 
and R, to explain why strong confounding has 
occurred between D and R, and to give 
formulas for the construction of sets of gamble 
pairs unconfounded (orthogonal) in D and R. 
The use of such sets in future research should 
help to clarify the effects of D and R on pre- 


ference strength. 
Mathematics of D and R 


For the succeeding mathematical develop- 
ments, we assume that 0 < P < 1; that is, 
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each state of nature has a nonzero probability 
of occurring. We also assume that a > c, and 
d b. This assumption specifies ‘choice 
conflict,” that is, œ is preferable for one state 
of nature, whereas @ is preferable for the other. 
The subject does not know which state of 
nature obtains until he commits himself to a 
choice; therefore, he can imagine either choice 
as being superior, depending on the state of 
nature. Choice conflict would also obtain if 
the inequality signs were reversed, that is, if 
¢> a, and b > d. These latter conditions would 
require a different but redundant mathematical 
development. The succeeding mathematical 
development can apply to the latter "reversed" 
conditions simply by exchanging the rows of 
the payoff matrix and relabeling so that a > é, 
and d > b. 

The condition of choice conflict has been 
of greatest interest to psychologists ; however, 
other matrices are possible, in particular, those 
characterized by dominance. If dominance 
holds, one gamble can be confidently re- 
jected, even without prior knowledge about 
the state of nature, since the alternate gamble 
has superior or equal payoffs for both states 
of nature (superiority for at least one state). 
The succeeding development is not applicable 
to dominance matrices, though they are con- 
sidered briefly later. 

Let EV, and EV; be the expected values for 
a and 8: EV, = Pa + (1 — P); EVg = Pc 
+ (1 — PM. Let D(a8) = EV, — EVs: D(Ba) 
= EVs — EV, = — D(aB). At times the nota- 
tion D is used without a qualifier to refer to 
either D(a8) or D(Ba). Let x=a—c, and 
y—d—b. x and y are positive by the inequality 
assumption concerning choice conflict. From 
the expected value formulas one can derive 


D(a8) = Px — (1 — P)y 1] 


For any payoff matrix a corresponding re- 
gret matrix can be derived. Each entry in the 
regret matrix is derived by subtracting the 
corresponding payoff from the maximum payoff 
in the same column. In symbols, taking ac- 
count of the assumption concerning choice 
conflict, the regret matrix for the above payoff 
matrix is 

P 1—P 
a 0 y 
x | 0 
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The expected regret (ER) for a or 8 can be 
calculated in manner analogous to calcula- 
tion of expected value, except that regrets 
replace the payoffs. ER, = (1 — P)y, and 
ERs = Px. Comparison of these formulas with 
Equation 1 yields 


D(a8) = ER; — ER, [2] 
By definition, the expected regrel ralio (R) is 
ER; 
AY s 3 
R(aB) ER, + ER; [3] 
R(fa) = — PR. 


ER, + ER, ~ |T R8) [4] 


This index was described by Suydam (1965), 
who called it *expected loss ratio.” Although 
the quantity called "regret" in the present 
paper is frequently called “loss” instead, loss 
can also mean negative payoff, so “regret” is 
used here. 

By combining Equations 2 and $, it is 
possible to demonstrate that 


R(aB) > X if and only if D(a8) > 0 [5a] 


R(8) = } if and only if D(a8) =0 [5b] 
R(aB) < 1 if and only if D(ag) < 0 [5c] 


Substitution for ER, and ERg in Equation 3 
yields 
Pe 

a 
Pr4(1— P)y 
All components on the right-hand side of 
Equation 6, that is, P, (1 — P), x, and y, are 
positive. This, with Equation 4 and considering 
the form of Equation 6, yields 


R(aB) = [6] 


0<R<1, 


where R is used generally to mean R(aB) or 
R(Ba). 

If P, D, and R are specified, Equations 1 and 
6 are two equations with two unknowns, x and 
y. The solution is 


x = —P(a8)R (ap) 
P[2R(aB) — 1] 


= D(aB)[1 — R(ag)] 
(1 — P)2R(aB) — 1j 


[7] 


[8] 


< = ee 
E O E 
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Once x is found, any values of a and c may be 

used giving x — a — c, and likewise, once y is 

found, any values of b and d may be used giving 
=d—b. 

A set of gamble pairs orthogonal in R and D 
can be constructed by using, say, each of three 
values of D(ag) with each of three values of 
R(aB). One would have a set of nine gamble 
pairs orthogonal in R and D. Table 1 illustrates 
such a set for P = 1/2. The payoffs b and c 
were arbitrarily assigned values of zero. P can 
be varied orthogonally with both R and D, 
if desired. 

In specifying values of R and D for an 
orthogonal set of gamble pairs, one must not 
violate the conditions of Equation 5. For 
example, R values greater than 1/2 cannot be 
combined with nonpositive D values. Sets of 
orthogonal gamble pairs must have only R 
values greater than 1/2 combined with D 
values greater than 0, or R values less than 1/2 
combined with D values less than 0. A D of 0 
cannot be used, since no variation of R is 
possible with a D of 0. Otherwise, D may have 
any magnitude; R, however, is restricted to 
DARSA 

Given that one has a gamble pair with 
specified values of R, D, and P, one can add 
an arbitrary constant (possibly negative) to 
the entries in either column of the payoff 
matrix without changing D or R, since, as is 
clear from Equations 1 and 6, D and R depend 
only on the differences between column entries, 
not on the absolute values. 

If all entries in a payoff matrix are multiplied 
by a positive constant k, R is unchanged, but 
the expected value difference is altered by the 
factor k (i.e., the new D is k times the original 
D). 
So much for the case of choice conflict. When 
one gamble dominates the other, R must equal 
1 or 0, the limiting values for R. This results 
from the fact that the regret matrix has a row 
with two zero entries, so either ER, or ERs 
equals 0. If dominance characterizes a payoff 
matrix, the mathematical development given 
above for conflict matrices does not apply; 
in particular, one cannot calculate R from 


Equation 6. 
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TABLE 1 


PAYOFFS FOR NINE GAMBLE Pars ORTHOGONAL IN D 
(DIFFERENCE IN EXPECTED VALUE) AND 
R (ExPEcrEp REGRET Ratio) 


R 
D 
0.6 0.75 0.9 
4 M 0 12 0 9 0 
0 16 0 4 0 1 
3 48 p 18 0 
0 32 0 0 2 
16 96 0 48 0 36 0 
0 64 0 16 0 4 


Confounding of D and R in Previous Research 


One would hardly expect the D and R indexes 
for a set of gamble pairs to be orthogonal by 
happenstance. The degree of confounding that 
has existed in previous research, however, has 
been extreme. The strong confounding was 
largely due to what might be called the 
"arbitrary" scoring methods used in past 
investigations. For a given gamble pair, should 
one calculate R(ag) and p(a8), or should one 
calculate R(Ba) and (8a)? As noted, R(8a) 
is simply equal to 1 — R(a8), and (8o) 
= 1 — p(aB), yet the variants chosen can make 
à vast difference in the apparent success of the 
index as a predictor of preference strength. An 
example: Suppose one payoff matrix has 
possible choices « and 8, and a second matrix 
has possible choices y and à. Suppose R (a8) 
=.7, p(aB) = .80, R(y8) = .6, and p(y8) 
= .85. The hypothesized positive relationship 
between R and fails, since the pair of gambles 
with the higher R has the smaller p. But sup- 
pose one scored instead A(8a) = .3 and 
p(Ba) = .2, while maintaining the original 
scoring for the yé gamble pair. Then there is a 
positive relationship between R and p. A 
similar analysis could be presented for D or for 
a rating-scale measure of preference strength. 
For each pair of gambles one has the choice of 
scoring R > .5 (D > 0), or R< .5 (D < 0). 
No rule has been presented to constrain the 
alternatives; experimenters have used “arbi- 
trary” scoring methods resulting in some pairs 
of gambles in a set being scored with R .5 
and some with R < .5. For example, Edwards 
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(1956) took "a" in R(a8) to be the “left” 
choice in a probability learning task; Taub 
and Myers (1961) scored on the choice with 
the higher probability of winning; and Suydam 
(1965) scored on the "red" choice. It would 
be better if experimenters always scored 
R > .5 (D> 0). This would reduce the con- 
founding considerably. Call this the non- 
arbitrary scoring method. It is implicit in the 
method for constructing orthogonal sets that 
nonarbitrary scoring be used; that is, if the 
gamble pairs are all constructed with R > 5; 
they must be scored in the same way. (Alter- 
natively, one could use R < .5 for construc- 
tion and scoring, but the convention given 
seems more convenient.) 

The strong confounding that results from 
the arbitrary scoring method can be clearly 
seen in Table 1 of Suydam (1965). The six 
smallest R values, .01 and .02 (five such) have 
associated D values of — 17.8, — 13.3, — 9.7, 
— 13.3, — 8.8, and — 97, respectively. The 
four largest R values of 9, .75 (two such), 
and .7, have associated D values of 1:6,. 1.2 
(two such), and 0.8, respectively. Intermediate 
R values have intermediate D values associ- 
ated with them. 

In view of the strong confounding of D 
with R, one must have severe reservations 
about the apparently extreme success of R as 
a predictor of preference strength. Edwards 
(1956), for example, made pair comparisons 
for all pairs of gamble pairs, and 94% of the 
results accorded with the R predictions. If he 
had scored in the manner suggested, however, 
his result would have undoubtedly been less 
impressive (but more meaningful). (It should 
be pointed out that Edwards, 1956, included 
four-outcome as well as two-outcome gambles 
in his experiment. Our analysis has not dealt 
with more than two outcomes.) After all, 
one can almost guarantee the prediction success 
of R when comparing two gamble pairs, one 
with R > .5, and one with R < .5. It musi 
hold, for example, if the higher expected value 
gambles are always preferred, that is, if when 
EV, > EV, (cB) > p(Ba). This has been 
the case for the large majority of gamble pairs 
that have been utilized: always for Taub 
and Myers (1961), Edwards (1956), and Suy- 
dam (1965), and in 24 out of 26 cases for 
Myers et al. (1961). 
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It is standard in this literature to report a 
correlation between the R (or D) index and 
average preference strength. Even if D had no 
effect, such correlations have to be inflated 
by the arbitrary Scoring methods, since, 
compared to nonarbitrary scoring, arbitrary 
methods increase the abscissa (R) variance 
without increasing the average residual vari- 
ance. With this in mind (as well as the con- 
founding), we are less impressed than we would 
otherwise be by the product-moment cor- 
relations between mean preference strength 
and Rs of .96, .97, and .98 reported by Edwards 
(1956), Suydam (1965), and Taub ànd M vers 
(1961), respectively. 

Quite apart from the confounding and the 
inflated correlations, arbitrary Scoring is poor 
because the result presented to the reader, 
for instance, the correlation, is subject to 
arbitrary variation that the reader is unaware 
of. He does not realize that with the same data 
there are various schemes for deciding which 
gamble should be scored, and that the choice 
can have a very large effect on the ostensible 
success of the index R (or D) as a predictor. 


Conclusions 


A method was presented for constructing 
sets of gamble pairs orthogonal in expected 
value difference (D) and expected regret 
ratio (R). In past research these two indexes 
have been strongly confounded, so that the 
apparent success of either index in predicting 
preference strength might have resulted to 
Some extent, at least, from the effect of the 
other index. A nonarbitrary scoring method is 
proposed. The method is necessary (but not 
sufficient) to avoid confounding. The use of 
orthogonal sets in future research should help 
to clarify the effects of D and R on preference 
strength. 
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"INFANTILE STIMULATION" IN RODENTS: 
A CONSIDERATION OF POSSIBLE MECHANISMS 


P. A. RUSSELL : 
University of Hull, England 


Four major types oí hypothesis concerned with the mediation of the effects 
of handling or otherwise treating infant rodents are reviewed, together with 
some evidence bearing on them. Existing experiments do not allow a clear 
statement of the possible roles of (a) tactile stimulation (direct action), (b) 


hypothermia, (c) maternal behavior, 
be mutually incompatible, and some 
Conceptual problems attaching to th 
choice between a relatively nonspecific 
or two or more Separate mechanisms 


solution may come from a clearer un 


affected by treatment. 


Levine (1962), reviewing what was even 
then a considerable body of evidence con- 
cerning the effects of handling, shocking, or 
in other ways "stimulating? infant rodents 
prior to weaning, concluded that “the solution 
to the problem of infantile stimulation and 
subsequent behaviour will only come when 
both independent and dependent variables are 
adequately specified [p. 155]." Itis the pur- 
pose of the present paper to examine some of 
the attempts which have subsequently been 
made to specify the nature of the independent 
variable in infantile treatment studies. Given 
that infantile handling or shock can effectively 
modify adult behavior, what aspect of these 
procedures is of functional importance in 
mediating the effects? 

Consideration of the types of treatment 
employed and the effects produced can be 
found, for example, in Levine (1962) and 
Denenberg (1967). In general terms, this 
paper is concerned with studies which have 
examined the effects on adult behavior pro- 
duced by removing pups from the nest-cage 
and their mother for a short period each day 
for various intervals prior to weaning. Pups 
may or may not be subjected to additional 


1 Thanks are due to Dr. D. I. Williams for advice 
and encouragement during the evolution of this 
paper, which was prepared while the author was in 
receipt of a Research Studentship grant from the 
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and (d) stress, These hypotheses may not 
possible lines of synthesis are suggested. 
em are emphasized with regard to the 
theory (“total stimulus input,” “Sstress”) 
(hypothermia, tactile 
derstanding of the physiological systems 


stimulation). The 


stimulation (“gentling,” electric shock) during 
this period of exposure. The behavioral effects 
of such treatments have generally been 
interpreted in terms of the “emotionality” 
construct. 


The Direct Action H pothesis 


Levine (1962) advanced the specific hy- 
pothesis that the effects of treatment are 
mediated by the direct action of stimulation 
impinging upon the young organism. It is 
assumed that the additional stimulation de- 
riving from treatment (whether mere handling 
or handling plus gentling or shock) serves to 
modify physiological systems in the neonate. 
Since most treatments have been applied and 
found to be effective at an age when the visual 
and auditory senses of the young rodent are 
not yet functional, it is further assumed that 
the operative aspect of treatment is the 
tactile/kinesthetic stimulation involved. There 
have been few studies using visual or auditory 
stimulation as treatments, but the evidence 
available suggests that these are not effective. 
Levine (1962) has reported that increasing 
the illumination level during rearing has no 
discernible effect on adult behavior. Denen- 
berg, Schell, Karas, and Haltmeyer (1966) 
experimentally rejected the hypothesis that 
stimulation arising from the surrounding en- 
vironment during infancy (e.g., “quiet” versus 
“noisy” rooms) has the same functional effect 
as stimulation imparted by handling. Since it 
could be argued that habituational factors 
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may be operative here, rather more relevant 
evidence comes from those studies which have 
examined the effects of stimulating the pups 
in the auditory modality for a short period 
each day. Hall and Whiteman (1951) found 
that subjecting mice to intense auditory 
stimulation for 2 minutes on each of Days 4, 
5, and 6 of life produced an increase in the 
amount of defecation and urination in a test 
situation given at 30-40 days. Strictly, the 
result cannot be taken as an indication of the 
effectiveness of auditory stimulation since the 
treated animals were Zandled during treat- 
ment, but were compared with nonhandled 
controls. Spence and Maher (1962) employed 
appropriate controls in this respect and failed 
to find any effects due to intense aperiodic 
auditory stimulation delivered to rat litters 
during a 2-minute period on each of Days 1- 
20, using an adult water-consumption test fol- 
lowing deprivation. 

For this reason infantile treatments are gen- 
erally seen as imparting additional tactile 
stimulation to the pup. Following Denenberg’s 
(1964) hypothesis, we might expect the total 
amount of such stimulation received by the 
pup prior to weaning to be the crucial varia- 
ble. 

Thus, one possible interpretation of the 
role of the treatment variable in infantile 
treatment studies is that it provides addi- 
tional tactile stimulation which acts directly 
on the pup to effect changes in physiological 
systems which are responsible for mediating 
the observed changes in adult behavior, and 
that the crucial factor is the total amount of 
such stimulation received prior to weaning. 

While this type of direct action hypothesis 
had been previously questioned (Levine, 1959; 
Schaefer, 1957), it is only relatively recently 
that it has been directly challenged by ex- 
perimental evidence. The challenge has 
stemmed largely from two kinds of hypothe- 
ses—one centering upon the role of hypo- 
thermia and the other upon the role of ma- 
ternal factors. These two hypotheses are con- 
sidered in the following sections, and finally 
the concept of “stress,” as it has been applied 
to findings in this area, is examined with 
reference to the data generated by the two 


hypotheses. 


The Cooling Hypothesis 


Schaefer and Weingarten (1962) argued 
that since nontreated pups in treatment stud- 
ies receive frequent and intense tactile stimu- 
lation during the normal course of maternal 
care, the additional stimulation received by 
treated pups hardly seems sufficient to medi- 
ate the significant effects which may be ob- 
tained. It is worth noting that the additional 
tactile stimulation imparted by some handling 
procedures, for example, that used by Denen- 
berg (see Denenberg, 1967 and elsewhere), 
would appear to be extremely minimal. This 
led them to examine the alternative hypothe- 
sis that the effects of treatment are the result 
of incidentally cooling the pups during treat- 
ment. They suggested that under normal con- 
ditions some degree of hypothermia necessarily 
accompanies removal of the neonate from the 
nest, owing to its poorly developed thermo- 
regulating system and large surface-to-volume 
ratio. This assumption has since received 
direct empirical confirmation from measure- 
ment of rectal temperatures in rat pups 
(Hutchings, 1963). Since, almost without ex- 
ception, treatment studies have involved re- 
moval of the pups from the nest-cage, the 
possibility that the effects are mediated by 
cooling must be considered. Schaefer and 
Weingarten were able to show that artificially 
cooling the litter without removing the pups 
from the nest-cage (the cage containing the 
litter was placed in a refrigerator at 7°—12° 
centigrade (C.) for 12 minutes on Days 2-7 
of life) produced significant depletion of 
adrenal ascorbic acid following cold stress at 
12 days. Handling pups during an equivalent 
period produced closely similar effects, 
whereas nontreated controls failed to show 
significant AAA depletion. 

This result does not, of course, constitute 
critical evidence for the cooling hypothesis, 
and Schaefer and Weingarten’s conclusion 
that the operative factor in handling is cool- 
ing does not strictly follow from their demon- 
stration that the effects of handling and cool- 
ing are similar. Subsequent evidence, how- 
ever, has given considerable weight to this 
conclusion. 

One important consequence of a hypothesis, 
which asserts that treatment effects are medi- 
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ated not by the direct action of tactile stimu- 
lation but by the incidental cooling accompa- 
nying treatment, is that handling, or other- 
wise stimulating, the pup in the absence of 
cooling should have no appreciable effect on 
subsequent behavior. There are now a number 
of reports that this is so. The technique em- 
ployed is to remove the pup from the nest and 
ensure that it does not experience any loss of 
body temperature by maintaining it in an 
environment of appropriate temperature dur- 
ing its absence from the nest. Schaefer (1963) 
found significantly greater depletion of AAA 
following cold stress at 13 days of age in 
handled pups exposed for 8 minutes per day 
at room temperature on Days 2, 3, 4, and 
5 than in nonhandled controls. Animals han- 
dled and exposed at such a temperature as to 
prevent heat loss, however, did not differ sig- 
nificantly from controls. Failure to find dií- 
ferences between handled noncooled animals 
and nonhandled controls has also been re- 
ported for behavioral measures of emotional- 
ity (Hutchings, 1965; McIver & Camp, 
1966) and for brain amine analysis (Nielson 
& Mclver, 1966). Further work (Hutchings, 
1967) has involved systematic variation of 
the duration and amount of heat loss in 
handled pups. Hutchings concluded that 
while the absolute amount of heat loss is not 
the critical variable, the rate of heat loss is; 
for instance, a group losing 5°-6°C. body 
temperature in a 3-minute period exhibits 
less adult emotionality than a group losing 
5*-6*C. over a 10-minute period. Hutching's 
results further suggest that the relationship 
between cooling and adult emotionality may 
be of a curvilinear form, with extensive cold 
exposure resulting in emotionality levels not 
significantly different from those of untreated 
animals. 

Results of this kind have led to the notion 
that handling and removing the pup from the 
nest cools it sufficiently “to alter ongoing 
enzymatic reactions involved in develop- 
mental processes, possibly producing perma- 
nent changes in physiological mechanisms 
underlying emotional and stress reactivity 
[Schaefer, 1963, p. 884].” There are, however, 
a number of difficulties facing an interpreta- 
tion of al/ infantile treatment studies solely in 
terms of cooling. First, McIver, Jeffrey, 


Stevenson, and Nielson (1968) have reported 
Some contrary findings with respect to the 
prediction that handled noncooled animals 
Should not differ from untreated controls. 
Considering Pups treated during the first 
week of life, those losing 3*C. over a 3- 
minute period exhibited lower emotionality, 
as measured by blood glucose level following 
open-field testing (BGL) and basal resistance 
level (BRL), than either nonhandled pups or 
pups handled without cooling, these latter two 
not differing significantly, This result is in 
accord with the cooling hypothesis, but other 
measures were not in accordance. Thus, han- 
dled cooled animals (both 3°C. and 7°C. 
groups) and handled noncooled animals were 
both less emotional than nonhandled animals 
as measured by open-field ambulation scores 
and post-BRL BGL levels (Week 1 animals), 
a result which is difficult to explain with- 
out assuming that some mechanism other 
than cooling is operative. 

Certain other results also strongly suggest 
this. If temperature change was the sole 
mechanism at work, any additional stimula- 
tion (e.g., tactile stimulation) given to pups 
during a period of exposure outside the nest 
should not give rise to effects over and above 
those attributable to mere exposure (i.e., coo]. 
ing effects). Several existing studies have 
(incidentally) provided a test of this—those 
in which the effects of administering electric 
shock to pups placed on a grid have been 
Compared with the effects of placing them on 
the grid for an equivalent period but not 
administering shock. This Setup presumably 
produces equivalent hypothermia in both 
groups, but the shocked group receive addi- 
tional stimulation. Levine employed this de- 
sign in two studies (1957, 1958). In the first, 
the shocked group (Days 1-20) exhibited 
rather less emotionality, as measured by an 
adult water-consumption test following depri- 
vation (5 — 0.1). Tn the second study, this 
difference proved Significant at the 596 level. 
Similarly, while Denenberg and Smith (1963) 
failed to find significant differences between 
shocked and grid-control animals on an open- 
field test given prior to an avoidance learning 
task, the two groups showed a difference in 
defecation behavior approaching significance 
at the 5% level on a subsequent test. It 
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seems, then, that slight effects are demon- 
strable, and further work along these lines 
may be indicated. 

Second, treatments which do not involve 
removal of the young from the nest should 
hardly prove to be effective, since there is 
unlikely to be much cooling, and yet there 
are indications that significant treatment ef- 
fects can be demonstrated in this way. Al- 
though Spence and Maher (1962) failed to 
find any effects of an auditory treatment 
administered without removing the pups from 
the nest-cage, it might be argued that this 
treatment was ineffective because it was in the 
auditory modality, rather than because it does 
not give rise to cooling. There is also a spe- 
cial difficulty attaching to the use of auditory 
stimulation in that it is presumably unlikely 
to be detected by the pup until the auditory 
sense becomes functional at about 13 days 
(Bolles & Woods, 1964). More relevant to 
the issue are attempts to stimulate pups 
tactually without removing them from the 
cage. Levine and Lewis (1959) found that 
mechanical shaking of the home-cage for 2 
minutes per day on Days 1-13 was sufficient 
to produce significantly greater depletion of 
AAA. following cold stress on the fourteenth 
day, comparing with untreated controls. How- 
ever, Schaefer and Weingarten (1962) ar- 
gued that this effect may not be due to tactual 
stimulation but to cooling, since Levine and 
Lewis removed the mother prior to shaking 
the cage, an operation which according to 
Schaefer and Weingarten may well scatter the 
pups outside the nest proper, and so allow an 
element of cooling to occur. Schaefer (1967) 
has reported that the “rectal temperature of 
two-day old rat pups in the intact nest, with 
or without the mother, remains constant at 
35°C. [p. 131],” so presumably the young 
must be scattered before appreciable cooling 
occurs, Whether scattering of the young is a 
feature of one treatment employed and found 
to be effective by Spence and Maher (1962), 
that is, simply carrying the cage containing 
mother and litter from one room to another, 
is not clear. But if such a possibility is ad- 
mitted, then it would seem to be impossible 
to test the cooling hypothesis by treating pups 
without removing them from the nest-cage, 
since it would be difficult to argue conclusively 


that there was no scattering of the young and 
so no likelihood of cooling. 

Third, if treatment effects are due solely to 
cooling of the pups, then pronounced effects 
should not be obtainable using treatments 
administered late in the preweaning period 
only, since Schaefer (1967) reported that 
after the second week of life the rectal tem- 
perature of the rat pup remains relatively 
constant under a wide range of environmental 
temperatures. The data here are not entirely 
clear-cut. Schaefer (1963) compared groups 
handled during Week 1, Week 2, or Week 3 
with a group handled on all 3 weeks prior to 
weaning. On a test of emotional “crouching” 
in adulthood, only Week 1 and Week 1-3 ani- 
mals were rated as significantly less emo- 
tional than untreated animals, leading Schae- 
fer to conclude that normal handling pro- 
cedures are only effective during the first week 
or so of life. The implication seems to be that 
typical handling procedures applied to the 
older pup (Week 2 and later) are not effec- 
tive in producing hypothermia, owing to the 
development of the capacity for relatively ef- 
ficient thermo-regulation. McIver et al. (1968) 
have examined this notion by subjecting pups 
to low ambient temperatures of an order (pre- 
determined empirically) sufficient to reduce 
their rectal temperatures by 3°C. even during 
the second and third weeks. These tempera- 
tures must necessarily be considerably lower 
than the ambient temperatures normally 
obtaining in conventional handling studies. 
Hypothermia on Week 2 was effective in re- 
ducing emotionality as measured by adult 
open-field ambulation, but hypothermia on 
Week 3 had no effect, compared with un- 
treated controls. Hypothermia on Week 2 
had no effect on BRL (males), but Week 3 
hypothermia effectively increased emotion- 
ality as inferred from this measure. Post open- 
field and post BRL BGLs (see above) were 
not significantly affected by hypothermia on 
Week 3, though lowered emotionality on this 
measure resulted from hypothermia on Week 
2. The finding that at least on some measures 
3°C. hypothermia was as effective on Week 2 
as on Week 1 is consistent with the hypothesis 
that the “critical period” for handling located 
during Week 1 (Schaefer, 1963) is attribut- 
able to the failure of handling procedures ap- 


ied later to effectively produce hypothermia 
in older pups. But on the other hand, the fail- 
ure to find a reduction in emotionality attrib- 
utable to 3°C. hypothermia during Week 3 on 
any measure suggests that other factors are 
also involved in setting a critical period. Tt 
must also be remembered that there have 
been reports of handling only during the lat- 
ter half of the preweaning period, effectively 
reducing emotionality as measured by open- 
field defecation and ambulation (Denenberg 
& Smith, 1963). Unless it can be assumed 
that there is some hypothermia accompanying 
removal from the nest during this period, 
which seems unlikely in view of Schaefer’s 
(1967) report, some mechanism other than 
cooling must necessarily be responsible. More 
importantly, although there are indications 
that handling prior to weaning is considerably 
more effective than handling after weaning 
(Levine & Otis, 1958), nonetheless, significant 
treatment effects have been demonstrated for 
postweaning treatments. Henderson (1966a) 
obtained lower adult emotionality (open-field 
test) in animals handled for 1 day following 
weaning, compared with untreated controls 
(shocked animals were actually more emo- 
tional). Similarly, Meyers (1965) reported 
that a group “gentled” immediately after 
weaning was less emotional as measured by 
a home-cage emergence test in adulthood. 
Clearly these latter effects cannot be attrib- 
uted to maternal influences (see below), and 
it is also fairly clear that cooling is unlikely 
to be important at this relatively advanced 
age, so again some other mechanism must 
necessarily be involved. Conceivably these 
postweaning treatment effects might represent 
conditioning or learning. Henderson (1966b) 
has examined this possibility for shock treat- 
ment effects, and concluded that conditioning 
may be involved, but that some more general 
change in arousal or emotionality also plays a 
part. A 
In view of these kinds of considerations it 
seems unlikely, then, that the cooling hy- 
pothesis can be made to account for all the 
effects reported. Indeed, in order to be able to 
explain both the fact that (a) under some 
circumstances handling in the absence of 
cooling is not an effective treatment (Hutch- 
ings, 1965, 1967; McIver & Camp, 1966; 
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Nielson & McIver, 1966; Schaefer, 1963) 
and (5) handling can be effective even when 
the amount of hypothermia involved is nil 
(McIver et al., 1968) or likely to be extremely 
small (Denenberg & Smith, 1963; Hender- 
son, 1966; Meyers, 1965), it seems necessary 
to assume that at least two mechanisms are 
involved. One of these must be based on cool- 
ing, with the nature of the second being un- 
clear but possibly involving some form of 
direct action. The alternative would be to 
attempt to reconcile these two sets of findings 
by subsuming them under some more in- 
clusive hypothesis. Two possibilities present 
themselves here—an explanation in terms of 
total stimulus input (Denenberg, 1964) or in 
terms of stress (Levine, 1956). The second of 
these is considered in a later section. A hy- 
pothesis framed in terms of total input might 
be made to account for the findings if it is 
assumed that cooling of the pup during the 
first few days of life provides additional stim- 
ulation which contributes to total input; 
something of this kind seems to be implied in 
Hutchings (1968) suggestion that cooling 
effects might be mediated by the activation of 
cold receptors in the skin. This explanation 
seems to differ somewhat from the assumption 
that cooling effects are due to change in on- 
going biochemical processes in the organism 
as a result of low temperature, a distinction 
which could presumably only be resolved by 
physiological investigation. A total input hy- 
pothesis would then be able to account for 
the effects of handling/shocking pups without 
appreciably altering their temperature if the 
stimulation deriving from such procedure 
contributes to total input via tactile/pain 
receptors. However, in order to explain the 
finding that pups handled without cooling 
during the first week or so do not differ from 
nonhandled controls, it would be necessary to 
assume further that tactile stimulation arising 
from handling during this period is not an 
important source of stimulus input. At a later 
stage in development, although relatively 
little in the way of stimulation is obtained 
from the negligible amount of hypothermia 
consequent on handling, stimulation from oth- 
er sources, for instance, tactile input, becomes 
important as the sensory and perceptual ca- 
pacities of the pup develop. The difficulty 
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here would be in explaining the finding of 
Mclver et al. (1968) that on some measures 
handling without cooling during the first 
week does appear to be an effective treatment. 

A total stimulus input hypothesis, then, 
might be made to encompass the work on 
cooling if certain assumptions are made. The 
choice seems to be between postulating one 
relatively nonspecific mechanism of this type 
or distinguishing two (or more?) separate 
mechanisms based on a more specific categori- 
zation of the stimulation, for instance, as 
“hypothermia,” “tactile input.” This point 
crops up again in connection with the third 
type of hypothesis considered in this review, 
which asserts that maternal factors may be 


important in mediating treatment effects. 


The Maternal Behavior Hypothesis 


The possibility that the effects of infantile 
treatment may be mediated not by the direct 
action of the stimulation on the pup but rather 
indirectly through the maternal behavior 
mechanism was recognized relatively early 
(Levine, 1959; Schaefer, 1957). Both work- 
ers examined the possibility that treatment 
may in some way interfere with maternal be- 
havior, since the pups are normally removed 
from the mother, and both tackled the prob- 
lem by removing the mother from the nest, 
leaving the pups in situ. Levine failed to find 
any effect on offspring behavior of removing 
the mother daily for a 5-minute period, and 
Schaefer failed to find any significant effect 
even when the mother was absent from her 
litter for up to 6 hours per day on the 21 days 
following birth. Accordingly, the maternal 
interference hypothesis was rejected in favor 
of direct action (Levine, 1962). Du Preez 
(1964) has also reported a failure to find an 
effect attributable to removing the mother 
from rat litters for 15 minutes a day on the 
first 25 days of life, considering a variety of 
adult behavior measures. Interestingly, it 
must be assumed that removing the mother 
from the litter does not result in scattering of 
the pups (at least in these studies) ; other- 
wise, in view of the apparent importance of 
cooling during the first week or so, we might 
expect an effect due to cooling alone here. 
Schaefer did actually find a slight (nonsignifi- 
cant) reduction in adult emotionality as a 
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result of maternal separation, a result which 
he later (1967) interpreted in terms of the 
slight cooling undergone by the pups. It must 
be emphasized that the possibility of cooling 
cannot strictly be excluded in the case of 
some of the studies reported below, and so the 
source of the effects obtained must remain 
unclear. 

More recently, the maternal behavior hy- 
pothesis has been reconsidered in the light of 
the finding that the development and main- 
tenance of normal maternal behavior pat- 
terns depends to an extent upon stimulation 
arising from the litter itself (Rosenblatt & 
Lehrman, 1963). Young (1965) suggested the 
possibility that infantile treatments may serve 
to change the stimulus properties of the neo- 
nate in such a way as to influence maternal 
behavior. It may be, then, that treatment 
effects are due not to the direct action of the 
stimulation but to the induced changes in ma- 
ternal behavior. It seems fairly clear that for 
such a mechanism to operate, treatment must 
in fact change the pups in some fairly direct 
fashion—but these changes are superficial in 
the sense that they themselves are not re- 
sponsible for treatment effects. Rather these 
effects are brought about by the mother 
reacting differently to treated pups on the 
basis of these changes. Richards (1966) and 
Meier and Schutzman (1968) have argued 
that such a mechanism might be important in 
mediating treatment effects. 

If such a mechanism exists it should be 
possible to detect differences in maternal be- 
havior between mothers of treated and un- 
treated litters, provided suitable techniques 
can be found. Ressler (1962) has shown that 
strain differences in parent-ofíspring contact 
behavior in mice can be detected by observa- 
tional techniques, and Barnett and Burn 
(1967) have used such a technique to examine 
parental behavior in mouse litters subjected 
to experimental treatments, recording the 
amount of contact given by both parents to 
offspring during 5-minute periods on Days 
6-10. It was found that significantly more 
contact was given to pups that had been ear- 
punched on Day 6 and not disturbed there- 
after than to untreated controls, Handling for 
a short period each day did not increase con- 
tact over and above that attributable to ear- 
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punching, but exposure out of the nest at 
34°C. for a 90-minute period each day did. 
A rather different technique has been used by 
Young (1965), involving examination of the 
retrieving response shown by the mother to 
rat pups outside the nest. The treatments em- 
ployed were hypothermia and rotation in a 
drum; in some cases, half of the pups from a 
litter were subjected to one of the treatments 
and the other half were untreated. Pups were 
placed in an arena attached to the nest-cage. 
Under a choice condition, with pairs of pups 
in the arena, there was a tendency for un- 
treated pups to be retrieved first, with a con- 
sequently longer retrieval latency for treated 
pups. In addition, retrieval latency was longer 
for both treated and untreated pups in mixed 
litters, compared with the latency for pups 
from a litter comprised entirely of untreated 
pups. This study directly implies a change in 
the pups as a result of treatment which is 
discernible to the mother and on the basis of 
which she reacts differently to them. Young’s 
result seems to imply that treatment serves 
to reduce maternal responsiveness to the 
young, whereas Barnett and Burn reported 
the opposite finding, though the two studies 
differ in respect of treatment, species, and 
measure of maternal behavior. These experi- 
ments do at least serve to show that differ- 
ences in maternal behavior between mothers of 
treated and untreated litters do appear and 
can be detected by observation, but this by 
no means represents critical evidence for the 
maternal behavior hypothesis since, as Hutch- 
ings (1967) has pointed out, it remains to be 
demonstrated that it is these differences and 
not the treatment per se (direct action, cool- 
ing) which mediate treatment effects. Even if 
it should prove possible to relate the extent of 
maternal behavior change and the magnitude 
of treatment effects, these two may still only 
be incidentally correlated via the treatment 
variable. 

It is also pertinent to ask just what stimu- 
lus properties of the pup are changed by 
treatment to allow discrimination by the 
mother. Hutchings (1967) has argued that if 
altered maternal response occurs on the basis 
of the lowered temperature of the litter as a 
result of removal, it should be possible to 
eliminate treatment effects by rewarming the 
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litter before returning it. This was achieved 
by placing the litter in an incubator at 33?— 
34°C. for 5 minutes following cooling treat- 
ment. Despite this, cold-treated groups still 
differed significantly amongst themselves and 
from nontreated controls. It remains possible 
that the mother responds on the basis of other 
cues, possibly olfactory ones, though it would 
be difficult to see how these could be related 
to the rate of temperature loss in pups, which 
Hutchings reported to be the critical variable. 

In order to disentangle the possible effects 
of maternal behavior from the possible effects 
of direct action of the treatment, it is clearly 
necessary to set up an experiment in which 
maternal behavior is manipulated as a variable 
without imposing a treatment on the pups. 
This would allow a statement of the role of 
maternal behavior factors, although only by 
inference could we extend such findings to the 
role of maternal behavior in treatment stud- 
ies, and it would remain possible that both 
maternal and other mechanisms may be op- 
erative here. With this cautionary note in 
mind, we may examine a body of evidence 
relating to the effects of maternal behavior on 
offspring behavior. 

Differences in maternal behavior can be ob- 
tained in several ways: (a) By treating the 
mother subsequent to the birth of her litter. 
Denenberg, Ottinger, and Stephens (1962) 
found that Shocking mothers during this 
period, or rotating them to litters other than 
their own, produced a significant increase in 
the emotionality of the offspring, comparing 
with nontreated control litters. Techniques 
such as this are unsatisfactory inasmuch as 
they involve removal of the mother from the 
nest, so admitting the possibility that the 
results are due to cooling of the litter in her 
absence. In addition, as La Barba (1967) has 
pointed out, shocking and rotating mothers 
between litters might well give rise to some 
serious disruption in maternal behavior and a 
deficiency in the care of the pups. La Barba’s 
attempt to overcome this objection by kand- 
ling mothers after the birth of their litters is 
less objectionable but suffers from the same 
criticism concerning Cooling. Results from this 
type of study are necessarily equivocal. (5) 
Better evidence Comes from those studies in 
which maternal behavior is manipulated by 
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treating the mother before the birth of her 
litter, and preferably before pregnancy. 
Denenberg and Whimbey (1963) used mothers 
that had been handled/not handled as pups. 
With appropriate cross-fostering techniques, it 
proved possible to isolate a postnatal maternal 
component (among others) aífecting the sub- 
sequent emotional behavior of the offspring. 
Joffe (1965) has isolated a similar component 
for females subjected to stress in maturity. 
(c) In similar fashion, a maternal factor may 
be isolated by making use of existing between- 
or within-strain variation in maternal behav- 
ior. Ottinger, Denenberg, and Stephens 
(1963) employed mothers of high, moderate, 
and low emotionality (as determined by a 
prior open-field test) and a technique of 
rotating mothers from litter to litter. More 
emotional mothers reared offspring which were 
more active when tested as adults in the open 
field. Subsequent experiments with cross- 
fostering techniques extracted both a “pre- 
natal genetic? and a “postnatal maternal" 
determinant of offspring behavior. 

Studies from Categories b and c above pro- 
vide strong evidence that differences in mater- 
nal behavior can be responsible for affecting 
offspring behavior in ways which appear to 
be closely similar to those obtained by treat- 
ing the pups. As has been noted above, this 
cannot be taken as evidence for the operation 
of maternal influences in treatment studies. 
In fact we may well ask kow maternal re- 
action serves to influence offspring behavior, 
and it seems likely that the mechanism must 
in some way involve differential amounts of 
contact between mother and pups or differ- 
ences in the quality of maternal care received. 
This being so, maternal influence may in fact 
be interpretable in terms of direct action, 
that is, in terms of the amount of tactile 
stimulation received by the pups from the 
mother, or even in terms of cooling. Clearly, 
differences in responsiveness of mothers to 
their pups might result in different degrees 
of cooling if there are differences In the 
length of time pups are left unattended or 
allowed to remain outside the nest. Such pos- 
sibilities must be considered before we can 
conclude that there is a. specific maternal be- 
havior mechanism, that is, that postnatal ma- 
ternal determinants of offspring behavior arise 
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through specific types of response made by the 
mother toward her pups. It is clear that 
existing studies are equivocal on this point. 

As an example we may take Newell's 
(1967) study, in which it was specifically 
argued that Denenberg's (1964) hypothesis 
may be tested by separating the mother from 
her pups for different periods; pups deprived 
of maternal care for a long period should be 
more emotional as adults than those deprived 
for a short period, since the former receive 
less stimulation from the mother and so have 
a smaller total input. Mothers were removed 
irom mice litters for a 12-hour period on 
either 4, 8, or 12 days prior to weaning. 
Newell concluded that the findings failed to 
offer conclusive support for the hypothesis 
since although in one strain maternal depriva- 
tion increased emotionality (open-field test), 
in a second it reduced it. A similar experi- 
ment has been reported by La Barba, 
Lutz, and White (1968). Female mice were 
removed from their litters on an “1 hour 
on, 6 hours off" cycle on Days 2-18. This 
procedure significantly reduced activity scores 
of the offspring measured at maturity, com- 
paring with mice from control litters. These 
studies are interesting in that they report 
significant effects attributable to maternal 
deprivation, in contrast to the work of 
Schaefer (1957), Levine (1959), and Du 
Preez (1964) mentioned earlier, a fact which 
may reflect their use of mice, or alternatively 
of their use of relatively long periods of sepa- 
ration. What they do not do is allow any 
conclusions to be drawn concerning the media- 
tion of the effects. Removing the mother not 
only reduces the amount of tactile stimulation 
received by the pups, but also leaves them 
open to cooling and deprives them of specific 
patterns of maternal care. 


The Stress Hypothesis 


Levine (e.g., 1956) has suggested that the 
essential feature of the treatments employed 
in infantile stimulation studies is that they 
subject the pup to stimulation which is in 
some way “noxious” or “stressful? and 
further that stress in infancy serves to 
“immunize” the animal against excessive re- 
action to stress in adulthood. Such an 
interpretation assumes that treatments act 


directly on the young organism and serve to 
modify the development of physiological Sys- 
tems underlying stress reactivity. This type 
of hypothesis can be made to account for the 
results of most, if not all, infantile treatment 
studies. It can obviously explain the effects 
of handling, shocking, or otherwise. directly 
stimulating the pups, providing that it can be 
assumed that a particular type of treatment 
is stressful. It is able to account for those 
studies which have examined the effects of 
cooling the pups if the reasonable assumption 
is made that such a procedure is stressful. In 
order to encompass the “handling without 
cooling" studies it is necessary to assume that 
handling during the first week or so is not 
in itself stressing, an assumption which is per- 
haps not too unreasonable in view of the 
poorly developed sensory systems of the 
neonate. Handling of the older, better devel- 
oped pup may then be viewed as stressful. 
Similarly, the effects of maternal deprivation 
might be predicted, since this is likely to 
stress the pup either by cooling it or reducing 
its milk supply. 

This type of hypothesis does, however, suf- 
fer from the major shortcoming that since 
an adequate definition of “stress” is lacking 
there is some circularity inherent in it. There 
is no independent means of establishing 
whether a given treatment is stressful to the 
pup other than by observing its effects on 
later behavior. This objection may well be 
overcome by a more extensive analysis of the 
physiological changes in the pup at the time 
of treatment. If it could be shown that all 
infantile treatments affect a common physio- 
logical system, and that this system is one 
which is implicated in the organism’s reaction 
to stressful situations, then the use of the 
stress concept would be fully justified. A 
conclusion here must await physiological work 
in this area. 


Conclusions 


Perhaps the clearest point to emerge from 
this consideration of infantile stimulation is 
that existing studies do not allow us to draw 
any strong conclusions as to the mediation of 
treatment effects. 

The role of maternal influences is by no 
means clear, and in any case it may prove 
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possible to subsume maternal effects under 
the more general hypothesis of "stress" or 
"total stimulus input." On the other hand, 
there may be a more specific maternal mecha- 
nism: further work in this area is indicated. 

As to the role of hypothermia, this has been 
very strongly implicated in the mediation of 
handling effects during the first week or so 
of life. The major question seems to be 
whether or not such a mechanism should be 
regarded as quite distinct from the mecha- 
nism underlying the effects of handling later 
in the preweaning period. Can cooling effects 
in fact be subsumed under the stress or total 
stimulus input hypotheses? Again, the an- 
swer to such problems may lie in the realm 
of physiology. Whether we regard cooling 
during the first week and handling during 
later periods as operating through the same 
or different mechanisms must depend in the 
final analysis on how and through what sys- 
tems their effects are mediated. To speculate 
further, we might wonder whether or not it 
will ultimately prove possible to inte: te the 
findings from cooling and handling studies 
with those from studies of maternal influences, 
through a fuller understanding of the physio- 
logical systems involved. 

Finally, it is as well to note that we have 
made the implicit assumption that such 
diverse treatments as handling, shocking, and 
cooling of the pup produce effects in identical 
response systems in the adult, an assumption 
Which has been made almost universally but 
never rigorously tested. In particular it is 
important to ask whether the concept of 
“emotionality” as it has been applied to 
infantile stimulation effects is an entirely ade- 
quate one. La Barba et al. (1968), for ex- 
ample, have cautioned against the indiscrimi- 
nate use of this term on the basis of the 
finding that treatments may affect activity 
measures without affecting defecation. Indeed 
there are indications that a given treatment 
may affect open-field ambulation but not 
defecation (Denenberg, Carlson, & Stephens, 
1962; Ottinger et al., 1963) or conversely 
that defecation may be affected but not am- 
bulation (Denenberg, Ottinger, & Stephens, 
1962). The work of McIver et al. (1968) 
provided a very clear demonstration that not 
only do the effects of a given treatment vary 
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considerably according to the measure of 
“emotionality” employed, but also the conclu- 
sions to be drawn differ in a very fundamental 
way depending on the measure selected, so 
we are brought back to Levine's (1962) 
plea for an adequate specification of both 
independent and dependent variables. 
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Constraints upon inquiry in personality imposed by current research methods 
were examined by («) a survey of empirical work published in two major 
personality journals and (b) a consideration of methodological and ethical 
issues raised in recent research criticism. Review of samples, research proce- 
dures, and social-psychological context in 226 empirical studies revealed that 
current methodological practices are incapable of approaching questions of 
real importance in personality and involve serious problems beyond those noted 
in recent research criticism. Recent proposals for methodological reforms offer 
only partial solutions and require further attention to the personal involvement 
and responsibility of investigators. This paper proposes a conceptual schema 
for ordering personality research strategies, a distinction between “contractual” 
and "collaborative" models of subject-experimenter relationships, and sug- 
gestions for increasing the relevance and responsibility of personality research. 


The greatly increased volume of empirical 
work on personality in recent years, and the 
appearance of several new textbooks on per- 
sonality research and theory (e.g, Maddi, 
1968; Mehrabian, 1968; Mischel, 1968; 
Schontz, 1965), may be read as indications 
of flourishing inquiry in personality. Yet there 
is a growing concern (Adelson, 1969; Sanford, 
1965) that personology is, in fact, languish- 
ing; that in adopting the research values and 
strategies of “process” psychology, contempo- 
rary investigators have relegated the psychol- 
ogy of “person” to a peripheral world of the 
psychotherapist, the behavior modifier, and 
the encounter group. Moreover, the increasing 
concern with general issues emerging in con- 
temporary research on research (Argyris, 
1968; Kelman, 1967; Orne, 1962; Rosenthal, 
1966; Schultz, 1969; Stricker, 1967) has a 
particularly keen significance for the field of 
personality study. A reexamination of the 
status of personality research may help to 
define these issues. 

There is a clear consensus among contempo- 
rary personologists concerning the goals, 
methods, and values informing personality 
research. While governed by the scientific 
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principles and ethical concerns common to all 
psychology, personality has a unique, central 
role in the field. As Baughman and Welsh 
(1962) observed: 


Personality bridges the two basic branches of psy- 
chology—experimental psychology, which tends 
toward the biological sciences, and social psychology 
which is closely allied to the social studies . . . the 
concepts of personality study can tie together the 
views of these two areas and minimize the danger 
of dehumanization . . . [through clear focus upon] 
our unit of study, individual man [pp. 16-17]. 


The program of personality research has been 
clearly restated by Maddi (1968): 


The personologist is interested in universals . . . in 
the commonalities among people [as well as] . . . 
in the attempt to identify and classify differences 
among people .. . The personologist is rather un- 
usual in not restricting himself to behavior easily 
traceable to social and biological pressures of the 
moment... Of all the social and biological scien- 
tists, then, the personologist believes most deeply in 
the complexity and individuality of life... his 
emphasis [is] upon characteristics . . . that show 
continuity in time... that seem to have psycho- 
logical importance . . . that have some ready rela- 
tionship to the major goals and directions of the 
person's life . . . The personologist is interested in 
all rather than only some of the psychological behav- 
ior of the person... Finally . . . personologists 
...are primarily intererested in the adult human 
being... the fruit of development—a congealed 
personality that exerts a pervasive influence on 
present and future behavior . . . [pp. 7-9; original 
italics]. 

Toward achieving these goals, the personolo- 
gist employs a wide range of methods: “the 
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L TABLE 1 


SUMMARY OF SELECTED AsPECTS or RESEARCH METHODS IN 226 PERSONALITY STUDIES 


Subject samples 


Research strategy and Procedures 


Sex composition N % General strategy N % 
M n 71 31 Experimental 177 78 
fe s 33 15 Field 7 20 
Both (specified) 77 34 Combined 2 
Both (unspecified) 22 10 = 
Indeterminate 23 10 Time span of inquiry N % 
Age-role composition N % Single sessions 177 78 
Less than 1 month 34 15 
Preschool 2 — Over 1 month 15 7 
Elementary 16 7 — 
Secondary 15 7 Cognitive clarity N % 
College—psychology 110 44 
College 50 22 Deception 129 57 
Adult—general 3 p Debriefing specified 42 (32) 
Adult—special 13 6 Interpretive feedback 1 E 
Multiple 17 8 


? Includes studies with earlier pretest administered in regular classes, 


b Percentage of deception studies. 


cross-cultural, the developmental, the Clinical, 
the experimental, and the quantitative 
[ Murphy, 1968, p. 19]. 

The breadth and depth of the current re- 
search stemming from this tradition may be 
best demonstrated by a review of the current 
personality literature. Whom are we study- 
ing? How much are we prepared to learn 
about an individual? In what settings and 
relationships? Answers to such questions, im- 
plicit in the research methods of the field, 
operate to structure and to limit the pos- 
sibilities of new knowledge. Assessment of a 
broad sample of current published research 
on personality may provide an indication of 
whether unexamined assumptions of investiga- 
tors may be restricting, rather than advancing, 
knowledge about the organization of psycho- 
logical processes within the person. 

The present report, based upon such an 
assessment of current research, was guided by 
three purposes: an examination of constraints 
imposed by research methods, consideration of 
methodological and ethical issues posed in 
recent research criticism, and presentation of 
some alternative ways of solving the problems 
encountered. 


A SURVEY or CURRENT PERSONALITY 
RESEARCH 


Articles appearing in the 1968 volumes of 
two major journals publishing substantive re- 
search on personality (Journal of Personality 
and Journal of Personality and Social Psy- 
chology) constituted the sample of the review. 
Since the concern was with the scope and 
structure of inquiry, rather than its content, 
subject matter was disregarded, and the focus 
placed on selected aspects of research method: 
composition of subject samples, general re- 
search strategy, and social-psychological as- 
pects of the research, Major findings, based 
upon tabulations for 226 substantive articles 
(excluding a few editorials, methodological 
and animal studies, and monograph supple- 
ments) are summarized in Table 1, 


Whom Are We Studying? 
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choice. The expansion of inquiry to include 
à broader sample of adults in a variety of 
community settings (e.g., pregnant mothers, 
African tribesmen, racetrack patrons) is a 
heartening development. However, with a few 
exceptions, community adults were studied in 
such a limited and trivial fashion as to con- 
tribute very little to knowledge of personality. 
Males and females were represented in ap- 
proximately a 2 to 1 ratio, a finding which 
seems to suggest some correction of the seri- 
ous imbalance in sex composition of samples 
noted in a review of the literature nearly a 
decade ago (Carlson & Carlson, 1960). How- 
ever, upon closer examination, this “improve- 
ment" appears a remarkably fragile basis for 
the extension of knowledge. For the sexes are 
typically studied in segregation: approxi- 
mately half of the studies used subjects of 
only one sex, and many of the remaining 
studies used single-sex groups for separate 
parts of an investigation. Moreover, one-fifth 
of the studies either failed to indicate propor- 
tions of males and females in the sample or 
to indicate whether sex varied at all. 
Investigators! remarkable lack of interest in 
an intuitively (and empirically) important 
aspect of personality appeared in several ways. 
Among the studies that could have tested 
for sex differences, less than half reported 
such tests. Yet in 51 studies where sex differ- 
ences were examined, significant effects of sex 
were found in 7446 of the studies. Meanwhile, 
an implicit awareness of sex differences may 
be seen in a nascent trend toward using 
males-only in studying achievement, bargain- 
ing, etc., and to use females-only in studying 
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altruism, cooperation, and the like. A most 
illuminating instance of how sex differences 
are treated in current research is to be found 
in a study by Wilson and Insko (1968). In 
an investigation which combines most of the 
current preoccupations of the field (e.g., 
prisoner's dilemma, stooges, evaluative ratings 
of others, and a "theoretical" controversy), 
clear-cut findings supporting the major hy- 
pothesis were reported on the basis of tables 
(see Table 2) in which even clearer sex 
differences failed to capture the attention of 
the investigators or of journal reviewers. 

Given the compelling evidence of the per- 
vasiveness and importance of sex differences 
in personality, both from present "internal" 
data and a wealth of "external" evidence 
(Maccoby, 1966), current research methods 
seem designed to avoid, rather than to 
confront, a central problem of personality 
organization. 


How Do We Study Persons? 

Experimental methods predominated in cur- 
rent research, with over half of the published 
studies employing manipulative procedures. 
Correlational studies (broadly defined) ac- 
counted for most of the remaining work, 
although a small, but promising upsurge of 
observational studies in naturalistic settings 
should be noted. The sole study in which 
experimental and field methods were com- 
bined, and a basic finding established with 
two appropriate samples, was contributed by 
a team of sociologists (O’Toole & Dubin, 


1968). 


TABLE 2 
MEAN SroocE IMPRESSION 


No interval between sessions 


One wk. between sessions 


No measurement delay 


Measurement delay 


No measurement delay | Measurement delay 


Male Female Male Female Male Female Male Female 
" Ey 4 
Competitive-Cooperative | 34.80 34.60 E» E. roy ce EOM EE 
Cooperative-Competitive 28.60 € | p 2.20 7.00 2.00 6.60. 5.00 
Digernes "m cn Rece ncy | Recency Recency | Recency | Recency | Recency | Recency 
irection y 1| = = 
df = 1/64; all other effects are nonsignificant. 


— E 
— cency effect is significant at the .01 level. 

Ren cal from an article by Warner Wilson and Chester Insko publ 
Pisydlielb£y. Copyrighted by the American Psychological Association, c- 


'iblished in the May 1968 Journal of Personality and Social 
1968. 
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How Much Are We Prepared to Lcarn About 
a Person? 


Extremes of a “comprehensiveness” dimen- 
sion are represented by studies in which sub- 
jects left no trace of their personal Participa- 
tion, merely contributing isolated bits of be- 
havior to a data pool, and a few in which 
subjects provided exhaustive data on a battery 
of tests and biographical inventories. How- 
ever, the typical study represented an indi- 
vidual in terms of his sex (sometimes), treat- 
ment condition, performance Scores, and 
ratings of partner or experimenter in post- 
test inquiry. Although the literature as a 
whole has elicited a wide range of potenti- 
ally important information about persons, no 
single investigation either noted or utilized 
much information about any individual sub- 
ject. Thus the task performances of subjects 
in current research remain uninterpretable as 
personality data in the absence of anchoring 
information, 

An interesting sidelight is the new role of 
introspection in contemporary research, Apart 
from a few studies in which the subjects’ ac- 
count of private experience constituted pri- 
mary data, introspective reports are currently 

used (a) in deriving pretest Scores as a basis 
for assignment to experimenta] groups or (b) 
“as a check on the effectiveness of the experi- 
mental manipulation.” 

The time span of contemporary inquiry js 
short. The vast majority of published work 
was based upon a single session; less than 
one-fifth of reported studies involved more 
than a 2-week period, and rarer still were the 
few studies involving follow-up over signifi- 
cant periods of time. The only examples of 
investigators’ extended delay of gratification 
were two follow-up studies (over 15 and 18 
months) of smoking behavior (Johnson, 
1968; Mann & Janis, 1968) and a 3-year 
follow-up of mental retardates (Zigler, Balla, 
& Butterfield, 1968). 


What Is the Interpersonal Context of 
Research? 


With a few notable exceptions, the current 
mode of inquiry involves highly imper- 
sonal subject-experimenter relationships, 
“conscripted” subjects who are expected (and 
expect) to conform to research requirements 
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with little explanation and little interpretive 


feedback, 


Deception remains a salient feature of ex- 
perimental inquiry. Over half of the total 
sample and 73% of the experimental studies 
relied upon deception as a means of manipu- 
lating major variables, There are, as Stricker 
(1967) has pointed out, many ways of de- 


Kanouse, 1968) which illustrates the poten- 
tialities of this research tradition: 


A student who (a) Volunteered to Participate in a 
study of “regional speech differences” for $1.50, (b) 
tape-recorded a Prepared speech, and was (c) told 
that his recordings would be used in a nationwide 
Survey and in classes at the university, (d) sent to 
another building to be interviewed by a fake "assist- 
ant to the dean," (e) detained by a fake Schedule 
delay, (f) induced to participate in another survey 
while waiting, (g) induced to reaffirm his “choice” 
to participate, (A) told that he should report to 
another building a half-mile away, (7) rescued by 
another stooge who provided an empty classroom, 
(7) asked to Write an essay contrary to his own 
beliefs, (k) told that his essay would be published 
in the campus newspaper, (I) required to read a 
prepared list of arguments, (7) required to reaffirm 
his free choice to participate; he then (2) wrote an 
essay, and was (o) required to “go down four flights 
of stairs, traverse approximately a block to a(nother) 
building . , . and go up three flights of Stairs... ,” 
(p) interviewed by a fake “assistant to the dean” 
who expressed interest in the student's opinions, 
(q) asked to fill out an attitude questionnaire, (7) 


including an 


explanation of the study and hypothesis involved . . . 


[Kiesler et al, 1968, p. 334].3 


Debriefing was explicitly Teported in only 
one-third of the deception Studies, Moreover, 
debriefing took a number of quite different 

? Incredibly, à profi 
mitment” Prior to th 
Praised this investigati 
experimental design, 


EI 


hi 
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forms. While a few studies reported thorough, 
integrative interpretations of experimental 
manipulations, more characteristic were sev- 
eral other types of debriefing: (a) Undoing 
(e.g., “After the subject completed the ques- 
tionnaire, Dr. ... entered the office and 
debriefed him. Because all of the subjects had 
received a rather negative evaluation, they 
were delighted to learn that the evaluation 
was preprogrammed rather than an accurate 
reflection of their creative ability"—Aronson 
& Cope, 1968, p. 10); (b) Rationalization 
(e.g., *After completing the questionnaire, the 
subject was queried as to possible suspicion 
and the purposes of the experiment and the 
need for deception explained”—Helmreich & 
Collins, 1968, p. 78) ; and (c) Silencing (e.g., 
*Before leaving, the experimenter revealed 
the nature of the study and got the subject to 
promise not to discuss it with anyone"—Mills 
& Jellison, 1968, p. 61). 

While in many studies the nature of the 
tasks may be presumed to be obvious, and 
perhaps meaningful to the subjects, it is sur- 
prising that subjects! experience of research 
participation was not considered worthy of 
mention. Only 1 of the 226 studies (Steiner, 
1968) noted provision for giving subjects a 
report of the findings of the investigation. 
When one considers that the vast bulk of 
the year's published research was made pos- 
sible by the requirement of research participa- 
tion as a “learning experience" in psychology 
courses, this lack of concern for subjects' 
cognitive clarity is remarkable. 

One further aspect of the subject-experi- 
menter relationship proved impossible to tabu- 
late. Only in a small proportion of cases was 
it possible to determine whether the investi- 
gator or anonymous assistants had “run the 
subjects or whether, in fact, the experimenter 


had ever seen his subjects. 


Questions We Can Neither Ask Nor Answer 


It is instructive to consider the range of 
personological questions which cannot be in- 
vestigated by our current research methods. 
While the year’s research literature provides 
a few isolated exceptions to many of the 
following generalizations, the central tendency 
of our current modes of inquiry is so strong 
as to mark a real barrier to knowledge. 


We cannot study the organization of per- 
sonality because we know at most only 
one or two “facts” about any subject. We 
cannot study the stability of personality, 
nor its development over epochs of life, be- 
cause we see our subjects for an hour. We 
cannot study the problems or capacities of the 
mature individual, because we study late ado- 
lescents. We cannot study psychosexuality, 
because we avoid looking at distinctive quali- 
ties of masculinity and femininity as a focal 
problem. We cannot study how persons strive 
for their important goals, because we elect 
to induce motivational sets. We cannot study 
constitutional, temperamental variables be- 
cause (apart from a few glances at increments 
in galvanic skin response under stress) we 
do not consider biological bases of personality. 
We cannot study the development and power 
of friendship—nor the course of true love— 
because we choose to manipulate interpersonal 
attraction. 

Such a list of cognitive deficits in the col- 
lective psyche of the field might be extended 
at great length. Personality psychology would 
seem to be paying an exorbitant price in 
potential knowledge for the security afforded 
by preserving norms of convenience and meth- 
odological orthodoxy. Must these important, 
unanswered questions be left to literature and 
psychiatry? If so, what would be the use of 
our work? 

Obviously, no single scientist, no single 
study, no single research tradition can pos- 
sibly deal “scientifically” with anything so 
complex as a whole person. But the attempt 
can be made collectively and cumulatively. 
The present impoverishment of personality 
research is distressing because it suggests that 
the goal of studying whole persons has been 
abandoned. However, the fragmented and 
limited quality of current research may stem 
less from myopia or opportunism than from 
the absence of a conceptual framework for 


guiding inquiry in personality. 


A Typology of Rescarch Approaches 
Kluckhohn and Murray (1949) remind 
us that every man is “ . . . like all other 
men, like some other men, and like zo other 
men [p. 35],” and in an insightful truism 
have also offered an implicit model for order- 
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ing the scope and methods of inquiry in per- 
sonality. The typology of research approaches 
suggested by this model corresponds to the 
major traditions from which personality study 
has grown: (a) the experimental methods of 
laboratory psychology, (b) the correlational 
methods of differential psychology, and (c) 
the clinical methods stemming from the tradi- 
tion of French psychiatry and Viennese psy- 
choanalysis. However, the typology has con- 
siderably more than mere historical interest, 
and may serve to illumine the present state of 
Personality research and point toward a con- 
ceptually based set of solutions to present 
problems. 

“. . . like ALL other men.” The psycholo- 
gist working from this perspective seeks uni- 
versals, the discovery of general laws of 
human nature, The emphasis is upon psycho- 
logical processes; persons are essentially 
"carriers" of the variables under investigation, 
Research methods reflect the basic assump- 
tions of this approach: persons are inter- 
changeable; random assignment to treatment 
conditions is employed to insure control of 
idiosyncratic qualities which are “noise” in the 
generalist's inquiry. The basic assumption of 
the equivalence of subjects leads to further 
methodological implications: (a) experimental 
manipulation of independent variables as the 
source of subject variability, (5) dimensional 
treatment of psychological variables, (c) rela- 
tive deemphasis upon genetic variation and 
constitutional bases of individuality, and (d) 
emphasis upon situational factors as major 
sources of variation in human nature. Among 
the many current examples of generalist tradi- 
tion in personality research, one notes the 
extensive work on cognitive dissonance, the 
attempts to establish laws of interpersonal 
attraction and impression formation, the con- 
ditions under which cooperation and competi- 
tion are elicited in interpersonal events. 

5... like SOME other men.” This tradi- 
tion studies psychological processes and their 
organization in different kinds of subjects: its 
aim is that of identifying group differences 
that make a difference. Such inquiry estab- 
lishes typologies, charts the influence of mod- 
erator variables, and, substituting measure- 
ment for manipulation, tends to employ cor- 
relational methods (broadly conceived, in all 
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their contemporary variations). Further, the 
differential approach tends to (a) seek natu- 
ral occurrences of the phenomena under in- 
vestigation, (5) emphasize discontinuities— 
whether of developmental level, character 
types, or social class—as correctives to an 
assumption of continuity, (c) emphasize both 
genetic variation and cultural determinism as 
sources of critical differences, and (d) empha- 
size intrinsic intrapersonal Structures as base 
lines for further inquiry. Current examples 
of the differential approach in personality 
research include inquiry on the differences 
between internal and external controllers, 
different Consequences of repression and sensi- 
tization as defense Styles, the nature of sex 
differences in personality, and studies pointing 
to the limits of generalists? formulations—for 
instance, Bishop’s (1967) demonstration that 
predictions from Cognitive dissonance theory 
fail to fit the “anal” personality, 

€ ... like NO other men.” The clinical 
tradition, in its concern for mapping the intri- 
cate organization of psychological processes 
within the unique individual is the prototype; 
however, the individual approach includes less 
comprehensive kinds of inquiry, including 
“ipsative” methods (Broverman, 1962), 
“morphogenic” methods (Allport, 1968), a 
concern for “personal constructs” (Kelly, 
1955) and inquiry focused upon the “repre- 
sentative case” (Schontz, 1965). While the 
potency of the case method in the develop- 
ment of personality Study should be so 
obvious as to require no special emphasis, 
the particular quality of the 
(American) Zeitgeist imposes a Special prob- 
lem: the “clinical” tradition has come to con- 
note a “helping” orientation totally extrane- 
ous to the method itself. (The relevance of 


nd dynamics of the indi- 
ble of representing this 
(c) identify general psy- 
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chological problems emerging from the exami- 
nation of individual personality, and (d) 
provide a field for testing the formulations 
derived from general and differential inquiry. 
Contemporary examples of the individual ap- 
proach include White's (1966) intensive 
studies of three “normal” personalities over a 
significant period of the life span, or, in a 
different mode, Alfert’s (1967) demonstration 
that ipsative analyses of data on stress re- 
actions revealed order and coherence which 
had eluded the generalists “normative” 
approach to the same data. 

Some of the conceptual and methodological 
issues involved in a comparable tripartite 
schema are presented in Emmerich’s (1968) 
discussion of “classical,” “differential,” and 
“jpsative” models of development. However, 
while Emmerich's three models are presented 
as competing candidates, the three approaches 
presented here are conceived of as comple- 
mentary rather than competitive alternatives. 
Tentative knowledge gained from any one ap- 
proach must ultimately be weighed by alter- 
native methods. A *general law" which oper- 
ates only in certain kinds of people, or which 
can predict little of significance in an indi- 
vidual's life may prove very trivial in under- 
standing personality. Similarly, an elaborate 
case study which neither inspires nor tests 
inquiry of a more general nature is basically 
irrelevant to personology. 

This schema offers another way of assess- 
ing the current status of personality research: 
How well does current inquiry represent the 
person in these three fundamental aspects? 

The 226 articles reviewed above were re- 
examined in terms of the present typology, 
and each article assigned to one of three 
categories as follows: : 

1. General. The research method disre- 
garded preexisting subject variables and used 
random assignment to treatment conditions 
and/or treated subjects’ scores as à continuous 
dimension. 

2. Differential. The research method made 
at least minimal provision for identifying 
group differences on the basis of preexisting 
subject variables. Assignment to this category 
was “lenient” in the sense that a problem 
conceived in general terms was classified as 
“differential” if tests for qualitative differ- 
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ences (e.g. sex differences) were reported. 

3. Individual. The research method in- 
cluded extensive study of one or more indi- 
viduals, and (a) retained individual cases as 
the unit of analysis or (5) included extensive 
case examples in presentation of findings. 

The results of this analysis give a clear and 
consistent picture of the “generalist” bias in 
contemporary personality research, with 128 
(5796) of the studies disregarding subject 
variables. Ninety-eight (4396) of the studies 
classified as "differential" examined a some- 
what limited set of bases of group differences 
(e.g, high versus low anxiety, repressors 
versus sensitizers, firstborns versus later- 
borns, males versus females); extreme groups 
defined on single dimensions rather than 
"types" were the norm. Finally, the analysis 
revealed that not a single published study 
attempted even minimal inquiry into the 
organization of personality variables within 
the individual. 

Thus the present analysis provides a fur- 
ther basis for concern about the status of 
personality research. Conceivably, persono- 
logical studies may appear from time to time 
in clinical and psychiatric journals not en- 
compassed in the present review. But even 
if this could be demonstrated, what are the 
implications? Is personology to be left to the 
exclusive concern of the healer rather than 
the scientist? Should personality research be 
redefined as *the experimental study of per- 
sonality fragments in artificial situations?" 
It is not so much the pretensions of the field 
as the neglect of legitimate and necessary 
pretensions which poses problems for the 
serious personologist. 

Possible reasons for the current state of 
affairs, along with suggestions for enlarging 
the scope and relevance of personality study 
are considered in a later section. First, how- 
ever, parallels between the methodological 
problems noted in personality literature and 
the issues raised in recent criticism of general 
psychological literature should be noted. 


Error, ETHICS, AND DISCIPLINARY 
INTEGRITY 


The pervasive effects of atmosphere, expect- 
ancies, and demand characteristics, long recog- 
nized by some psychologists (Allport, 1968; 


Kelly, 1955; Lewin, 1935), and recently re- 
discovered and developed as a research area 
by a newer generation, now threaten to re- 
quire a wholesale reexamination of much of 
psychology’s substance. Recently, implications 
of this state of affairs have been examined 
in incisive comments of Adelson (1969), 
Argyris (1968), and Schultz (1969), among 
others. A 

Adelson (1969), in a brilliant critique of a 
year’s worth of personality research, noted 
the rigidity and irrelevance of current meth- 
odology and the availability of more appro- 
priate models of inquiry, but offered little 
optimism about reform. Argyris (1968) 
pointed to “unintended consequences of rigor- 
ous research,” noting that organization theory 
predicts current problems: the dependency or 
covert hostility of subjects caught in an au- 
thoritarian relationship, the “unionization” of 
subjects, the unintentional programming of 
people to become interpersonally incompetent. 
Schultz (1969) provided historical perspective 
on the role of the subject in psychological 
inquiry and underscored problems noted by 
recent critics of research methods: distortion 
of data through the use of irrelevant and non- 
representative samples, deception of the de- 
ceiver, and the ethical problems involved in 
investigators’ systematic disregard for the 
dignity and welfare of subjects. 

The personologist—for whom all of these 
issues are of the deepest concern—finds that 
additional problems are posed by current re- 
search conventions. Among the unintended 
consequences of acquiescence in these conven- 
tions is the abandonment of the field of 
normal personality as a primary scientific 
enterprise. Surely this is a territory worth 
defending against benign encroachments of 
the experimental and social psychologists 
whose “process” orientation is more consonant 
with their mission, or the “abnormal” clinical 
psychologists whose concern with helping per- 
sons makes their journals and textbooks more 
receptive to person-centered inquiry. : 

Another serious consequence of acquies- 
cence in current methodology is found in the 
abandonment of students and of curricular 
responsibilities (Carlson*). By permitting 


5 ity in the under- 
4 Carlson, R. (Chm.). Personality in 
graduate curriculum. Symposium presented at the 
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personality courses to become “adjustment” 
courses for nonmajors and “theory” courses 
for psychology majors, we are implicitly com- 
municating a disbelief in the value or possibil- 
ity of inquiry in personology. Thus psycholo- 
gists are unintentionally cutting off sources of 
future scholars prepared to coníront the intel- 
lectual problems of "personality" in a world 
where “depersonalization” has, for some years, 
been both a battle cry and a genuine human 
concern. 

Further, a personologist, relatively sensitive 
to personality consequences of situational 
pressures and role demands, must be particu- 
larly concerned about the other edge of the 
double blade of current methodology. For that 
violation of human dignity experienced by 
subjects in manipulative-deceptive relation- 
ships equally demeans the psychologist who 
adapts to a norm of distrust and comes to 
confuse games with the pursuit of science. 


Toward Solutions 


Once the depth and pervasiveness of the 
problem is genuinely confronted, solutions 
would seem to be well within the power of 
those seriously committed to the integrity and 
survival of the discipline. Several relevant pro- 
posals have been advanced by recent critics 
of the general psychological scene. 

Argyris’ (1968) proposals include the im- 
portant conceptions that subjects should be 
given greater control and influence, longer 
time perspective, and greater internal involve- 
ment in research projects. Noting that con- 
tamination of research by subjects! expect- 
ancies is inevitable, he pointed out that 
greater contyl of such contamination can be 
achieved byresearch methods which increase 
awarenessfof such expectations, Moreover, his 
empirical’ work demonstrating the resistance 
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: to change of “behavior that is internalized, 


highly potent, and related to... feelings 
of intellectual and interpersonal compe- 
tence . . ." strongly supports Argyris’ inter- 
pretation that "the more researchers study 
such behavior, the less they may need to 
worry about such contamination [p. 195]." 

Schultz (1969) posed two basic dimensions 
of reform: (a) the broadening of inquiry 
to include more representative noncollege 
samples, and (b) development of research 
methods which would reflect ethical responsi- 
bility and a contemporary image of man and 
of the scientific enterprise. 

Toward achievement of broad-gauged sam- 
pling, Schultz gave an ambivalent endorse- 
ment to Rosenthal's (1966) suggestion that 
independent data-collection centers undertake 
execution of studies designed by academic 
investigators, in the belief that standardi- 
zation of experimenter bias and the pro- 
curement of more representative samples 
would correct some of the serious deficiencies 
of current research. While the problem of the 
expense of establishing and maintaining such 
centers is noted by Schultz, more serious prob- 
lems are also involved. From the standpoint 
of the present critique, this proposal would 
seem to exacerbate fundamental problems. By 
increasing the distance between the investiga- 


'tor and his data, and by decreasing his per- 


sonal involvement and responsibility, the 
Rosenthal proposal would extend the investi- 
gator's license to ignore the relevance of his 
inquiry and to exploit or ignore dependency 
and counter-manipulation of subjects. Con- 
ceivably, the Rosenthal proposal could have 
some merit in broadening inquiry on certain 
impersonal problems of classical experimental 
psychology. But the overall effects of estab- 
lishing such an elaborate hand-washing ap- 
paratus would seem to be lethal for those 
engaged in issues of personality and inter- 
personal relationships. 
Toward solution of the second class of 
problems—ethics and relevance—Schultz gave 
tentative endorsement to Kelman's (1967) 
proposal of role playing as à research tech- 
nique and to Jourard’s (1968) suggestion that 
mutual self-disclosure by experimenter and 
subject might dispel the cloud of distrust 
which surrounds much current inquiry. 


Undoubtedly, Kelman’s (1967) suggestion 
for cooperative engagement of the subject’s 
imaginative resources in role playing the con- 
frontation of psychological problems is a 
notable improvement over current deceptive- 
manipulative methods. However, this pro- 
posal does not touch critical issues of problem 
finding or of real-life involvement; and one 
can imagine that the role-playing technique 
might even perpetuate (by legitimizing) the 
systematic irrelevance of much psychological 
research. As Schultz (1969) noted, the value 
of this method would be highly dependent 
upon characteristics of subjects and of sub- 
ject-experimenter relationships. Fundamental 
personological questions (e.g., What kinds of 
people can invest themselves in what kinds 
of role playing?) entirely bypassed in this 
proposal would define the limits of its rele- 
vance. There is a clear (but probably small) 
place for the role-playing technique in investi- 


gating a wide range of theoretically significant | 


problems; considerable thought should be 
given to these limits before the technique is 
blessed as a solution to our current malaise. 

Jourard’s (1968) recommendation of mutual 
self-disclosure poses somewhat different prob- 
lems. As Schultz (1969) noted, the personal- 
ity of the investigator would be a major 
factor—along with issues of the time and ex- 
pense involved. But more serious objections 
should be noted. Admittedly, almost any 
methods of restoring confidence, trust, and 
dignity in subject-experimenter relationships 
could be justified at this point in time. How- 
ever, the correction of a destructive relation- 
ship can, at most, create an atmosphere in 
which genuine research is possible; it does 
not constitute a research method. More- 
over, there are real risks that experimenter 
and subject alike may be seduced by the 
“togetherness” aspects of interaction; that the 
Jourard proposal, in the hands of investigators 
characterologically unsuited to this mode of 
inquiry, might cloud the purposes of scientific 
inquiry as thoroughly as the manipulative- 
experimental techniques in current vogue. 
There are a number of ways of deriving ir- 
relevant gratifications from research, and in 
the absence of scientific commitment and 
conceptual clarity, soft hearts are probably no 
better than hard heads as tools of the trade. 
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The climate of change, of self-examination, 
of genuine concern for reform of psychological 
inquiry is unmistakable. In this context, the 
present critique of several current proposals 
stems from a concern that basic issues may 
be obscured by the sense of urgency toward 
reform. At the risk of oversimplifying several 
thoughtful contributions, current reform pro- 
posals appear to address two quite different 
issues of control and of alienation. Control of 
unwanted variance (whether generated by in- 
appropriate sampling, experimenter bias, or 
subject expectancy) and counteracting forces 
toward alienation (of experimenter from sub- 
ject, of experimental inquiry from real-life 
relevance) are the explicit goals of most cur- 
rent proposals, or the rationales for more 
fundamental ones (e.g., Argyris, 1968). How- 
ever, most of the proposals fail to deal expli- 
citly with issues of relevance of method to 
problem, or to locate Clearly the responsibility 
for various reforms. Some explicit considera- 
tion of these issues seems warranted. 

On the relevance of sub ject samples. Most 
of the current concern about overreliance 
upon undergraduate students as research sub- 
jects (cf. Schultz, 1969) reflects the general- 
ist’s concern with nonrepresentative samples 
which constrain the generality of laws to be 
established from inquiry. There exists suf- 
ficient evidence of “bias” in college samples 
(volunteer bias, birth-order effects, socio- 
economic selectivity) to establish that under- 
graduate students are probably not repre- 
sentative of humankind. 

But a more serious concern may be raised 
about the misuse, rather than the use of col- 
lege students as research subjects. For stu- 
dents, as a group, possess many character- 
istics which make them highly appropriate 
subjects for personality research: they are 
curious, intelligent, motivated to explore their 
lives and experiences, capable. of articulate 
introspection—and their life situations gen- 
erally provide both time and meaningful set- 
tings for research participation. Yet it Js 
precisely this set of subject characteristics 
which tends to be ignored (or violated) in 
current research conventions. When the: sti- 
dent’s intrinsic motivation is “controlled by 
external requirements of SS ia 
tion and by induction of motivational sets, 
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when his introspective capacities are con- 
strained by the formats of rating scales and 
checklists, when his curiosity is violated by 
deception and false or partial feedback, the 
very characteristics which recommend him as 
a research subject are thrown away. 

Moreover, certain limitations upon the gen- 
erality of student-based research are equally 
ignored. Clearly, students are "unfinished" 
personalities, Coherent changes in ego struc- 
ture within and beyond college years have 
been demonstrated in a wide variety of 
studies ranging from Constantinople’s (1969) 
charting of ego changes in questionnaire re- 
sponses over the undergraduate years through 
White’s (1966) intensive longitudinal clinical 
studies of students whose postcollege years 
revealed surprising but theoretically signifi- 
cant consolidations and restructurings of intra- 
personal and interpersonal dynamics, 

While the year’s published personality re- 
search occasionally noted the selection of 
samples considered intrinsically relevant to the 
research problem (e.g., a series of obesity and 
eating studies published in the October 1968 
Journal of Personality and Social Psychology), 
the typical study evidently used college stu- 
dents simply because they were there, 

“Reform” proposals directed toward broad- 
ening of research samples need to be based 
upon thoughtful consideration of the subject 
characteristics of intrinsic relevance to re- 
search problems. Once such criteria are 
established, a range of populations might 
be sampled: colleagues, community adults, 
friends and families of students—along with 
a range of special populations (e.g., military 
personnel, prisoners, etc.) whose experience of 
Special situations might illumine important 
problems rarely encountered by subjects from 
the general population. Moreover, a vast range 
of individuals whose biographies and personal 
documents are capable of transcending demo- 
graphic constraints of usual research samples 
is available for inquiry in general, differen- 
tial, and individual terms, for instance, Cox's 
(1926) or Goertzel and Goertzel’s (1962) 
Studies of gifted individuals; Baldwin's 
(1942) analysis of Personal letters, 

On the relevance of research strategies. 
Three considerations May summarize the 
status of the field: (@) over three-fourths 
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of the personality literature surveyed used 
experimental" methods; (5) these experi- 
mental designs relied upon “remote control” 
of variables which, as Schontz (1965) has 
noted, are especially vulnerable to ambiguity 
and error; (c) within general psychology, 
there is an increasing conviction that strictly 
"experimental" methods are incapable of deal- 
ing with central aspects of human psychology 
(Deese, 1969; Walker, 1969). 

One clear recommendation for personality 
study emerges from a serious reading of gen- 
eral psychological research criticism: until 
"experimental" methods are developed which 
can (a) accommodate present knowledge and 
(5) offer relevant new knowledge not attain- 
able through development of other scientific 
tools, *experimental" studies of personality 
should be clearly deemphasized. This is not 
so radical a proposal as might appear, for it 
simply asks “time out” for untangling a basic 
confusion in the field. One can study persons 
in experimental situations—but one cannot 
“study personality experimentally." As Sara- 
son (1969), among others, has observed: “An 
experimental approach is useful in analyzing 
reactions of particular people in special situa- 
tions [p.v; italics added].” Only as experi- 
mental inquiry derives from concern for pre- 
existing subject variables, and provides experi- 
mental treatments theoretically focused upon 
such subject characteristics can experimental 
methods hope to illumine personality structure 
or dynamics. 

Meanwhile, the most serious thought should 
be given to alternative strategies of inquiry. 
Tyler (1959), in a paper which should be re- 
examined by all personality researchers, has 
presented a compelling case for the abandon- 
ment of dimensional approaches to individu- 
ality on the pragmatic grounds that this tradi- 
tional approach has failed to improve upon 
predictabilities. As an alternative, Tyler rec- 
ommended construction of personality inquiry 
in terms of “choice” and “organization”— 
along with a search for measurement pits 
proaches capable of representing individuality. 
Problems and possibilities for developing ap- 
proaches to reliability and validity in research 
on choice were presented in a further paper 
(Tyler, 1961), which incidentally serves as a 
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model of an investigator’s serious involvement 
in the intrinsic problems of research. 

A wide variety of alternative research 
strategies has become available in recent 
years: a range of “nonreactive? measures 
(Webb, Campbell, Schwartz, & Sechrest, 
1966), methods of naturalistic observation 
(Barker, 1963; Raush, 1967), methods for 
investigating the single case (Davidson & 
Costello, 1969) to name a few. Meanwhile, 
psychologists would do well to become ac- 
quainted with the work of contemporary an- 
thropologists who have developed concepts 
and methods of great relevance to personality 
study.® 

On the scope of personality study. Over 30 
years ago, Murray (1938) noted that “the 
reason why the results of so many researches 
in personality have been misleading or trivial 
is that experimenters have failed to obtain 
enough pertinent information about their sub- 
jects. Lacking these facts, accurate generaliza- 
tions are impossible [p. ix].” This comment 
could stand as a summary of current work— 
with the important amendment that the ac- 
cumulation of more “facts” (including much 
unassimilated data collected through Explora- 
tions in Personality) has not provided, nor is 
likely to provide, the basic generalizations 
needed in this field. 

Beyond noting the extraordinarily narrow 
and impoverished scope of current personality 
research, what suggestions for broadening and 
deepening inquiry might be advanced? At a 
very general level, this problem involves 
setting the collective level of aspiration of 
the entire field to include a shared responsi- 
bility that personality be investigated in its 
basic aspects (“like all other men, like some 
other men, like no other men”). Within the 
context of any single investigation, this prob- 
lem requires that the individual researcher 
pay the most serious attention to the need 
for obtaining information which is (a) poten- 
tially available, and (5) necessary to full 


6 Interestingly, the insularity of the psychologist 
seems to be increasing in recent years. While anthro- 
pologists have consistently “borrowed” psychological 
concepts and methods, psychologists’ awareness of 
anthropological inquiry often seems to have been 
arrested at the level of Malinowski and early Mead, 
and our “borrowing” largely limited to fields of 
mathematics and physics. 


214 


understanding of the intrinsic problem— 
whether or not such information is explicitly 
demanded by his immediate research design." 
Glaring examples of current failure to con- 
Sider appropriate breadth of scope may be 
drawn from experimental research in which 
(a) subjects? phenomenological reports are 
routinely sought (via checklists, ratings, and 
other means) and equally routinely non- 
analyzed and nonreported; and (5) even more 
serious misuse of subjects’ reported experi- 
ence to exclude individuals who do not readily 
accommodate to the Procrustean theoretical 
bed via “dropping” or “reassigning” recalci- 
trant subjects (as in dissonance research—cf. 
Chapanis & Chapanis, 1964) or via deliberate 
nonsampling of subjects known to fail to 
conform to theoretical expectations (e.g., 
Katz, 1967). 

More positively, the investigator should 
continually reconsider that the point of in- 
quiry is to understand the phenomena under 
investigation. Inevitably, this requires atten- 
tion to the intrapersonal context of research 
performances—here research subjects’ willing- 
ness to provide this context could be developed 
much more fully and fruitfully than we are 
inclined to do (cf. Lewin, 1935). 

More specific recommendations for extend- 
ing the scope of inquiry in selected aspects 
may also be noted: 

1. Critically important problems concern- 
ing the personality development and change 
clearly require longitudinal study. Why are 
longitudinal studies so rare? The traditional 
reasons are of two sorts: (a) longitudinal 
studies require such massive commitments of 
time and research technology as to demand 
large-scale organization support; and (5) 
obsolescence of research concepts and methods 
over the life span of a longitudinal study 
pose problems. 

However, a second look at the underlying 
assumptions is in order. If “longitudinal 
study” is identified with such large-scale en- 
deavors as the Fels studies (cf. Kagan & 


7A colleague has offered an instructive example 
from his own work: in a study (Levy, 1969) which 
involved asking 2800 school children about their 
preferences for Card IV versus Card VII of the 
Rorschach, he failed to inquire, Why? (N. Levy, 
personal communication, September 1969.) 
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Moss, 1962) or the Berkeley Growth Study 
(Jones & Bayley, 1941) the first of these 
traditional deterrents is obviously relevant; 
but the Fels and Berkeley studies offer dem- 
onstrations that longitudinal data collected by 
imaginative and responsible investigators are 
remarkably fruitful fields for “up-dated” 
analyses which keep pace with advances in 
the field. However, more modest studies are 
urgently needed in the personality field and 
are quite feasible. Short-term (e.g., 5-year) 
investigations of selected aspects of personal- 
ity would immensely enrich inquiry, and need 
not involve massive and comprehensive 
sampling and instrumentation, as seen in such 
examples as Escalona and Heider’s (1959) 
predictions of nursery school behavior from 
observations in infancy, E. L. Kelly's (1955) 
20-year study of marriage partners, or Carl- 
son's (1965) 6-year follow-up study of sex 
differences in the basis of self-esteem. (Inci- 
dentally, the last example illustrates the feasi- 
bility of longitudinal investigation conducted 
without any financial or formal institutional 
Support whatsoever.) Provision for longitudi- 
nal follow-up studies could be readily built 
into a wide range of personality studies with 
very little additional effort—and with extra- 
ordinarily rich potentialities for advancement 
of knowledge. 

2. Somewhat paradoxically, personality re- 
search might be strengthened and enriched by 
becoming an incidental by-product (rather 
than the focus) of naturally occurring data- 
collection situations, This rather obscure point 
may, perhaps, be illustrated with the writer’s 
personal experiences in teaching upper division 
Personality courses: In developing primary 
instructional purposes, I have often asked stu- 
dents to produce brief, introspective (and 
anonymous) accounts of selected personality 
constructs for use in class Projects exploring 


3 time of original 
data collection (cf. Carlson, 1971). 


The relevance of such “incidental” data col- 


lection to psychology Courses—where the vast 
bulk of personality research is conducted—is 


might also serve important 
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additional purposes of increasing the relevance 
of psychology instruction to students! pur- 
poses of exploring their own lives, and of 
incidental *recruitment" of potential scholars 
through the experience of disciplined inquiry 
or personological questions). However, in 
principle, this suggestion would apply equally 
to many other data-gathering settings, and 
fundamentally involves the use of informal 
"archival records" (whether freshman English 
themes, college-application essays, contents of 
employees’ suggestion boxes, comments of 
subjects in instructional-methods-evaluation 
studies, etc.) as unobtrusive measures of 
personality functioning in natural situations. 

3. Finally, as a general consideration rele- 
vant to many kinds of research problems, it 
is safe to assume that most subjects are will- 
ing to tell us much more than our current 
research designs ask about their experiences 
and the personal meanings of these experi- 
ences. Psychology is probably much poorer 
for its disinclination to listen to such potenti- 
ally important messages; hopefully, the recent 
demise of naive behaviorism will liberate us 
to take seriously human construction of ex- 
perience as more than merely countable “re- 
sponses” or “verbal reports.” 

On the relevance of social-psychological 
context. Two models of subject-investigator 
relationships may be discerned in current psy- 
chological inquiry: (@) a contractual relation- 
ship in which the subject is an “employee,” 
and (b) a collaborative relationship in which 
the subject is a “colleague.” Because the con- 
tractual model appears to be the increasingly 
dominant one (and even "reform" proposals 
urging the collaborative mode contain strongly 
contractual features), some of its implications 
need to be examined. 

The contractual relationship seems to have 
developed in the service of scientific concerns 
and to reflect some basic values and assump- 
tions about human nature, Thus, paying sub- 
jects for their services—whether in terms of 
money or course credit, or both—is seen as 
minimizing volunteer bias, minimizing the de- 
pendency of subjects, offering a more mean- 
ingful model for the participation of non- 
college samples, avoiding troublesome prob- 
lems of overdetermined motives for research 
participation—and as expressing concerns for 


equity, for fairness and regularity in defining 
rights and roles of subjects. Undoubtedly 
a contractual model is capable of correcting 
certain abuses of research relationships and 
offers a relevant model for some kinds of 
inquiry. However, important consequences of 
the use of this model should be examined. 
The “contract”? encourages investigators’ de- 
nial of the intrinsically “volunteer” quality of 
participation in personality research. More- 
over, by maintaining orderliness of one set of 
contractual obligations, investigators are en- 
abled to deny the legitimacy of other funda- 
mental obligations which are not written into 
the contract. (Two examples may illustrate 
this point: (a) scrupulous observation of 
obligations as an “experimenter” enables the 
academic researcher to ignore his equally rele- 
vant obligations as a "teacher" to provide 
maximum cognitive clarity to student-sub- 
jects; (b) the concept of "debriefing"—a 
military metaphor of extremely dubious rele- 
vance to psychological inquiry—implies that 
it is possible to undo an experimental set, and 
encourages the investigator to ignore his re- 
sponsibility for consequences to the subject 
which were not intended.) Further, a host 
of troubles have been introduced into the 
research enterprise as subjects, increasingly 
cynical about psychological inquiry and 
frankly motivated to pick up extra money, 
have tended to give only perfunctory atten- 
tion to research tasks. 

Fundamentally, the contractual model im- 
poses its own character upon research; it can 
only be appropriate in the investigation of 
contractual relationships, and even there may 
lead to immense confounding of the research 
findings by the effects of research context. In 
the study of personality, there are very few 
problems or occasions for which a contractual 
model is capable of providing valid informa- 
tion about persons! spontaneous ways of orga- 
nizing experience. As Loevinger (1966) has 
pointed out, a “contractual” interpersonal 
style is characteristic of particular stages of 
ego development, and thus important differ- 
ences among individuals are necessarily ob- 
scured when this interpersonal mode is also 
built into the context of inquiry. 

The alternative—a collaborative model— 
has its own problems and its own defining 
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characteristics. Basically, the collaborative 
model does not insist upon control or stan- 
dardization of motivation for research par- 
ticipation, but assumes that (a) subjects and 
settings are chosen for their intrinsic rele- 
vance to the problem at hand; and (5) the 
basic motive for research participation. must 
be the subject's intrinsic involvement in ex- 
ploring his own experience. (This motivation 
may take many equally appropriate forms: 
that of the patient wishing to be helped, the 
student wishing to understand and master his 
life experience, the excluded or alienated per- 
son who wants to assert and explore his indi- 
viduality, the "intelligent layman" who wants 
to express and understand his values and 
concerns in an intellectual framework.) The 
collaborative model does not rest upon narrow 
or egalitarian assumptions of undifferentiated 
“togetherness,” but assumes that subject and 
investigator have their different kinds of 
expertise which are united by a common belief 
in the possibility and value of clarification of 
experience through research participation. Un- 
questionably, a collaborative model is more 
demanding and more rewarding to subject and 
experimenter alike. Tt demands more candor 
and more thought on the part of the investi- 
gator in posing research problems, in engaging 
appropriate subjects, and in interpreting the 
nature of the experience; it demands more 
involvement from the subject, and offers the 
important reward of having his experience 
taken seriously. From the standpoint of per- 
sonality research, these conditions are likely to 
provide more genuine understanding of human 
personality organization and development.’ 
So much serious exploration of the ethical 
implications of research methodology has ap- 
peared in recent literature that very little 
needs to be added on this score. Our loss of 
innocence now requires that the psychologist 
give very serious attention to his own part 
in the research enterprise, accepting the re- 
sponsibility for his own choices and for the 
consequences of these choices. While several 
specific research recommendations might fol- 
low from the recent Englightenment, the most 
general one would be the commandment: Do 
not administer to any subject a research 
treatment you have not first “taken” yourself. 
®An example of such collaborative research is 
provided by Sanford (1969). 
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Although this proposal tends to elicit initial 
incredulity and irritation from professional 
colleagues, it is a completely serious recom- 
mendation: one which could enable psycholo- 
gists to examine and discard tendencies to 
impose unnecessary and brutally exhaustive 
testing programs or elaborate and unnecessary 
manipulation of their subjects—and, more im- 
portantly, engage the thoughtful participation 
of the psychologist as a person toward en- 
riching the relevance of his inquiry. Obvi- 
ously, this recommendation would eliminate 
deception as a technique of inquiry. However, 
there is considerable reason (Kelman, 1967; 
Stricker, 1967; Stricker, Messick, & Jackson, 
1967) to believe that this would be a very 
minor loss as contrasted with the increased 
veridicality and responsibility of nondeceptive 
research. 

Toward disciplinary responsibility. Clearly, 
the investigator must be permitted and en- 
couraged to define and explore problems in 
terms of their intrinsic merit. However, a 
climate of scientific freedom is not equivalent 
to a norm of laissez-faire. Since scientific in- 
quiry is currently conducted through an elabo- 
rate set of institutional apparatus, there are 
clear responsibilities at various levels of this 
network. Insofar as the individual investigator 
neglects social responsibility and ethical con- 
cerns, the agencies of Public policy must fill 
that vacuum. Recent Public Health Service 
directives concerning the welfare of human 
subjects represent a benign exercise of this 
responsibility: more restrictive and irrelevant 
constraints upon Psychological inquiry should 
be anticipated if Psychologists fai] to con- 
sider fully their Own responsibilities in the 
conduct of research, 

Among several Ways in which th 


E e discipline 
might cooperate toward development of more 
responsible and meaningful inquiry, the fol- 
lowing suggestions are offered: —' 

1. Psychology depar 
concern for the integrit 
poses upon which req 
Darticipation are based. Thus, explicit plans 
for valid interpretations of problems methods, 
and findings indi f , and 


(b) to class Sroups from which subjects are 
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participation requirements as a means of 
obtaining subjects, 

2. Psychological journals inevitably play a 
major part in determining the content and 
methods of scientific inquiry, and thus bear a 
major responsibility for the quality of re- 
search. (This responsibility is particularly 
clear in the case of journals of the American 
Psychological Association which are directly 
supported by and responsible to the entire 
discipline.) Among the obvious ways in which 
our journals might exercise this responsibility, 
three specific suggestions are noted: (a) In- 
trinsic relevance and responsibility of inquiry 
could be fostered by the adoption of 
Loevinger’s (1968) two-fold suggestion that 
published studies should be based upon 
samples clearly relevant to the problem and 
upon replicated findings. (5) Explicit atten- 
tion to well replicated findings as "control" 
variables should be required for publication. 
An obvious example is the suggestion of sev- 
eral investigators (Carlson & Carlson, 1960; 
Garai & Scheinfeld, 1968) that sex differences 
be considered in all published studies em- 
ploying mixed-sex samples. Developmental 
Psychology, presently unique in making this 
an explicit criterion, provides a model of edi- 
torial responsibility in this sphere. (c) Since 
journals, unlike individual investigators, have 
unique responsibility and power to consider 
the total import of inquiry in any field, con- 
siderations of balance and emphasis fall within 
the domain of journal editors. (An example 
from recent history is the Journal of Social 
Psychology's giving explicit priority to cross- 
cultural research.) From the standpoint of the 
present critique, those journals primarily con- 
cerned with personality research should exer- 
cise their responsibility toward correcting im- 
balance in current research by giving priority 
to investigations of personality organization 
within individuals, and to inquiry on qualita- 
tive, typological bases of personality organiza- 
tion since the present survey of personality 
literature clearly shows these facets neglected 
in favor of general experimental studies. 


Where Is the Person in Personality Research? 

That the person is not really studied in 
current personality research is clearly shown 
in the survey of the literature. But is it pos- 
sible that the product of this inquiry, in its 
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basic denial of the importance of personality, 
may be a faithful projection of the real lives 
and the real world of personality researchers? 
This is a chilling thought—but one which 
deserves very serious examination. 

Consider the passage with which Adelson 
(1969) ends the methodological section of 
his review of personality research: 


We like to pretend that our choice of methods is 
dictated by scientific considerations alone. In fact, 
the exigencies of the academic marketplace play an 
important and perhaps decisive role. The methodo- 
logical problems we have noted... reflect the 
pressure for quick publication. There is reason to 
doubt that there will be rapid reforms in method- 
ology until there is some reform of the university 
[p. 222, italics added]. 


It would be difficult to find a clearer state- 
ment of real despair: Adelson is suggesting 
that personality researchers really are such 
willing or powerless captives of field forces of 
academic and professional status definitions 
that they must await liberation at the hands 
of some external force which will alter the 
environment. 

Conceivably, studies of the “sociology of 
knowledge” might add to our awareness of— 
and liberation from—unexamined constraints 
upon inquiry. However, such studies are un- 
likely to tell us much more than we already 
know: that scholars are attracted to research 
problems through the influence of potent re- 
search models and teachers; that their train- 
ing, support, publication, and visibility are 
contingent upon the “cumulative” character 
of potential contributions; that innovative 
methods or findings are unlikely to achieve an 
impact until the field is prepared to assimilate 
them. From the standpoint of the present 
critique, it might be more valuable to develop 
inquiry in the “personology of knowledge," 
examining the personality characteristics 
which determine an investigator's resonance 
to and involvement in various substantive 
problems, his openness to innovations in con- 
tent and method, his independence of external 
supports in pursuit of inquiry, and his ca- 
pacity to transmit a sense of personal involve- 
ment in disciplined inquiry to his colleagues 
and students. 

Pending such inquiry, it may be that the 
current trends may continue. However, even 
within these constraints, a more optimistic 
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prognosis could be supported by a range of 
literature suggesting that significant personal 
change can result from the confrontation of 
the discrepancy between one's behavior and 
one's values; that such changes involve the 
engagement of one’s basic “ideoaffective 
postures” (Tomkins, 1965); and that such 
changes may be facilitated in times of wide- 
spread questioning and change. That we live 
in such a time of social change scarcely needs 
documentation; the present critique is offered 
as part of the discipline’s clear confrontation 
of its value-behavior discrepancies, and urges 
the engagement of our basic cognitive and 
affective commitments toward more relevant 
and responsible inquiry. 

The original question—Where is the person 
in personality research?—may have two dif- 
ferent answers, If the fully functioning per- 
son is not portrayed in our current personal- 
ity literature, we may simply be looking in 
the wrong place. There are serious investi- 
gators of personality—perhaps increasingly 
found outside psychology—whose inquiry and 
understanding will continue to appear in the 
books and papers which illumine the field. 
But this is no radical departure from the past. 
If White, Erikson, Tomkins, or Keniston—to 
mention a few prominent personologists—are 
not indexed in the volumes of our current 
personality journals, neither were Freud, Jung, 
Angyal, or Piaget. We might might simply 
retitle our journals to reflect their functions 
somewhat more accurately, and proceed as 
before. 

A second answer is this: The person is 
there—in our personality laboratories, class- 
rooms, and in the community—waiting to be 
engaged in serious studies of personality once 
those of us who investigate personality 
become able to invest ourselves in this task. 
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FURTHER COMMENTS—MISUSE OF ANALYSIS 
OF COVARIANCE 


DAVID R. HARRIS, CHARLES T. 


BISBEE, axp SELBY H. EVANS1 


Texas Christian University 


(1970) and Evans and Anastasio (1968). Examination of Sprott’s discussion 


the covariate be unaffected by 


t valid use of the analysis of covariance 
the treatment, The authors 


recommend consideration of à regression approach when the assumptions of 


the analysis of covariance are violated. 


Since the original development of analysis 
of covariance by Fisher (1932), the conditions 
and assumptions necessary for its appropriate 
application have been unusually vague. Thus 
considerable controversy has arisen. Evans 
and Anastasio (1968) attempted to clarify 
the implications of relaxing some of the 
original assumptions. In reply, Sprott (1970) 
has attempted to demonstrate that a liberal- 
ization of the restrictions of the use of analysis 
of covariance can be justified. The present 
paper extends and clarifies some of the state- 
ments of Evans and Anastasio, as well as re- 
plying to some of the comments of Sprott. 

The major objection raised by Sprott is 
that Evans and Anastasio pointed out that 
the covariance between the treatment effect 
and the covariate is assumed to be zero. Sprott 
contended that this requirement is too strict 
and that the proper requirement is that the 
expected value of this covariance must be 
zero. This distinction is a technical matter 
which does not lead to important differences 
in practice. Even Sprott agreed with Evans 
and Anastasio that the valid use of analysis 
of covariance requires that the covariate be 
unaffected by the treatment. 

The source of this disagreement lies in the 
difference between a fixed-effects and a ran- 
dom-effects model. As pointed out by Evans 
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and Anastasio, the conventional model for 
analysis of covariance treats the covariate as 
a fixed effect. Scheffé 

has noted that the assumption probably is not 


In a fixed-effects covariance model, the 
values of the covariate are assumed to be 
constants, In the random-effects model called 


ables. This change leads to complexities in the 
expected mean squares and the formation of 
Indeed, Expression 5, which he 
offered as the expected value of the residual- 
ized treatment mean square, is stated in terms 
of the obtained values of X, treating the Xs 
as constants. In order to make this expression 
consistent with his discussion, he would have 
to substitute his Equation 6 for X and then 
deal with the variance thus introduced by the 
error term in Equation 6. 

In the conventional fixed-effects model for 
analysis of covariance, both the treatment 
effects and the covariate values are treated as 
constants. The assumption that their covari- 
ance is zero is no more remarkable than the 
assumption in the two-way analysis of vari- 
ance that the interaction terms sum to zero 
over rows and colums, 

A second point to be clarified concerns the 
fact that Sprott’s discussion is based on the 
conventional model, which uses only one B 
term. This thereby incorporates the assump- 
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tion of homogeneity of between- and within- 
&roup regression. Sprott has thus avoided the 
major issue raised by Evans and Anastasio, 
that the presence of a correlation. between 
treatment effect and covariate must be rep- 
resented by a separate regression coefficient. 
This second coefficient enters into the be- 
tween-group regression only and would imply 
inhomogeneity of the between- and within- 
group regression. 

This second point is important when the 
treatment effect is correlated with the covari- 
ate, a usage which Sprott apparently regarded 
as invalid but potentially meaningful. The 
calculations in this case are based on the as- 
sumption of homogeneity of regression and 
result in treating the data in terms of an 
average regression coefficient. This average 
does not accurately represent either within or 
between regression and thus may lead to in- 
accurate results. 

Sprott's comments on the second illustration 
presented by Evans and Anastasio demon- 
strate how this application can be misleading. 
Sprott corrected the arithmetic in the example 
and found an F ratio of 12 (p < .01, df= 
2/11). He then concluded that there was a 
residual variation which was not predictable 
from the covariate and thus could be ascribed 
to the treatment. The illustration was pre- 
sented by Evans and Anastasio to show how 
this usage could leave a significant treatment 
effect even when none was present. Since 
Sprott's calculations show the effects to be 
significant, the illustration still serves that 
purpose. The data were constructed by form- 
ing three groups such that for each group, 
the variate and covariate means were iden- 
tical, A random term was then added to each 
observation. Thus the conclusion drawn by 
Sprott—that there is a residual treatment 
effect not predictable from the covariate—is 
not correct. 

It is obvious from inspection of the data in 
this illustration that the variate means are 
completely predictable from the covariate 
means, except for the differences introduced 
by the error term. A simple approach, such 
as subtraction of the covariate values from 
the variate values, would give a better in- 
dication of the residual variance. More gen- 
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erally, if one wants to residualize the variate 
means for regression on the covariate means, 
the coefficient which removes all the predict- 
able variance is the between-groups regression 
coefficient. 

It should also be noted that in many ap- 
plications, researchers seem to interpret analy- 
sis of covariance as removing variance which 
is attributable to or caused by the covariate. 
At best, this method merely removes variance 
which is correlated with the covariate. Psy- 
chologists rarely extend this confusion be- 
tween correlation and causation into other 
areas. Few psychologists would interpret a 
regression equation by saying that the pre- 
dictor caused the criterion values. Perhaps 
because of the similarity in the names and 
summary tables of analysis of variance and 
analysis of covariance, researchers are led 
into the misinterpretation that the implica- 
tions of the two methods are also similar. In 
fact, analysis of covariance is based on resi- 
dualization of the variate for regression on 
the covariate and so should be interpreted as 
any other regression analysis. 

There are, of course, many occasions on 
which psychologists would like to have a 
simple technique which would allow them to 
residualize the treatment means for the effect 
of the covariate, even when it is correlated 
with the treatment itself. The difficulty is that 
analysis of covariance is an unsuitable method 
because it does not allow for separate regres- 
sion for both between and within regression. 
A more appropriate tool for this purpose is a 
strictly correlational technique of the type 
proposed by Werts and Linn (1969). “It is 
preferable to adopt a regression approach 
rather than pretend to adopt ANCOVA [anal- 
ysis of covariance] when its assumptions are 
violated [Werts & Linn, 1969, p. 7]." The 
technique they recommend involves a linear 
regression estimate of the percentage of the 
variance accounted for by both the treatment 
and the covariate even when the two are 
statistically nonindependent. This is similar 
to the approach proposed by Cohen and by 
Overall for the general linear hypothesis. 
Although the present authors have not ex- 
amined the theoretical validity of these cor- 
relational approaches at this time, they do 
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feel that such methods are potentially quite 
useful and deserve careful attention. 
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NOTE ON WHY GENETIC CORRELATIONS ARE NOT SQUARED 


ARTHUR R. JENSEN! 


University of California, Berkeley 


Correlations between related persons (twins, siblings, parent-child, etc.) should not 
be squared in order to determine the proportion of variance they have in common. 
The correlation coefficient itself is this proportion. We are not using the correlation 
to predict the variance in a given trait for one set of persons from a knowledge of the 
trait values of their relatives, but to express the degree of overlap in trait variance, 
that is, the proportion of variance in common. The rationale of genetic correlations 
is explained, with examples in terms of a common elements model of correlation. 


Psychologists are often puzzled and confused 
by the fact that geneticists do not square the 
correlations between twins (or other kinship 
correlations) in order to obtain the percentage 
of variance explained by genetic factors. (Or, 
in the case of correlation between unrelated 
children reared together, the percentage of 
variance due to environmental factors.) Recent 
prominent examples of this confusion are found 
in Spuhler and Lindzey (1967, p. 403-404) and 
in Guilford (1967, p. 351-352). These authors 
incorrectly square kinship correlations and 
thereby arrive at erroneous conclusions. Most 
psychologists have learned to treat correlations 
as the square root of variance explained. But 
it is incorrect to take the square of twins or 
other kinship correlations to determine the pro- 
portion of variance attributable to genetic or 
environmental effects. The unsquared correla- 
tion itself is correctly interpreted as a propor- 
tion. Here is the reason: If the correlation be- 
tween phenotype (ie., obtained score) and 
genotype (i.e., the hypothetical genetic value 
of the individuals) is pn and if the correlation 
between phenotypes of pairs of individuals with 
the same genotypes but nothing else in com- 
mon (e.g., identical twins reared apart in ran- 
dom environments) is Ppp’, then rp, = po”, OF 
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A good analogy is with test reliability. Two 
equivalent forms of a test have only their true- 
score variance in common (analogous to ge- 
netic variance) and the error variance (anal- 
ogous to environmental variance) is not in 
common, that is, is uncorrelated. The correla- 
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tion between equivalent forms, ru, is the reli- 
ability, or the percentage of true score variance 
(“genetic variance”) the tests share in com- 
mon. The Vra is the correlation of obtained 
scores with true scores. Thus, the correlation 
between identical twins reared in uncorrelated 
environments is directly analogous to the cor- 
relation between equivalent forms of a test. 
The correlation in each case indicates the per- 
centage of variance in common, or the per- 
centage of genetic (or true score) variance. 

Another way of regarding the problem is in 
terms of the “common elements" formula for 
correlation (given in McNemar, 1949, pp. 117- 
118). This is 


Ne 
Nzt Ne NN, H Ne 


where 


Ne is number of elements common to vari- 
ables X and F, 

Nz is number of elements unique to X, 

N, is number of elements unique to F. 


A visually simple example is to consider the 
correlation of half-siblings, who have 25% of 
their genetic variance in common. The variance 
can be represented by squares, as in Figure 1. 
Assume c;? = o,2, as would be the case for two 
sets of half-sibs. For simplicity assume c;?, and 
c? each equals 100. (Also, for simplicity assume 
there is no environmental variance.) Then, 
applying the common elements formula for 


correlation, we have 


25 
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Fic. 1. Correlation of half-siblings who have 25% 
of their genetic variance in common. 


This is the correlation between half-sibs and is 
also the proportion of the genetic variance they 
have in common. The correlation between ob- 
tained scores and that part of the genetic vari- 
ance that half-sibs share in common is V25 
= .50. This can be visualized in Figure 2. 


Again, applying the common elements formula: 
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common 


Fic. 2. Correlation between obtained scores and 
shared genetic variance of half-sibs. 


Now, in this case, if we want to know the per- 
centage of total variance that is explained by 
the common genetic variance, we must square 
‘we, and this gives .25 or 25%, and, as can be 
seen in the diagram, this is one-fourth of the 
total area (variance). 
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COMPARISON OF NORMALIZATION THEORY AND NEURAL 
ENHANCEMENT EXPLANATION OF NEGATIVE 
AFTEREFFECTS 


RAY OVER? 


Dalhousie University 


It is proposed that neural inhibitory interaction underlies negative aftereffects as 
well as figural aftereffects, with the former occurring when nontopographic princi- 
ples are used by the nervous system to signal stimulus values and the latter when 
topographic coding is involved. The present paper examines relationships between 
aftereffect data and information available about feature analysis from electro- 
physiological measurement and contour masking studies. Most attention is given 
to tilt and movement aftereffects. Even though the neural enhancement position 
cannot be tested in detail until more is known about tuning characteristics in sen- 
sory systems and the way cells respond to abrupt changes in stimulus value, in its 
present form it offers a more satisfactory explanation of negative aftereffects than 
normalization theory does. There is little evidence to support the latter claim that 


. Curved lines appear to straighten when 
viewed over a period of time; in addition, 
straight lines seen immediately afterwards 
look curved in the direction opposite to the 
previously viewed contours (Gibson, 1933). 
The degree to which experience becomes less 
_ intense during the course of prolonged inspec- 
. tion of a constant stimulus value was termed 
adaptation by Gibson (1937b), and perceptual 
distortion consequent upon abrupt variation 
in stimulus value was referred to as negative 
aftereffect. Adaptation with an accompanying 
negative aftereffect occurs with the perception 
( of contour orientation, movement, brightness, 
color, and skin temperature, as well as with 
visual and kinesthetic judgment of curvature. 
There have been two approaches to the expla- 

» nation of negative aftereffects. Gibson (1937b, 
1959b) has been concerned with analyzing 
the way in which psychophysical correspond- 
ence is changed during and after prolonged 
stimulation rather than with establishing the 
neural correlates of perceptual distortion. 


| 
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negative aftereffects occur as by-products of a perceptual adaptation process. 


In other cases (e.g., Day, 1962, 1969; Ganz, 
1966b; Köhler & Wallach, 1944) negative 
aftereffects have been attributed to operations 
by which signals generated in the nervous 
system by one stimulus are modified as the 
consequence of prior exposure to another 
stimulus. 

The present paper compares these two 
positions. It is shown that Gibson's normaliza- 
tion theory differs from accounts proposing 
neural mechanisms of negative aftereffects 
in respects other than the level of analysis 
at which each is formulated. As the difference 
lies in predictions about which variables exert 
major influence over the magnitude of nega- 
tive aftereffect, each explanation is evaluated 
by determining the degree to which the pre- 
dictions it generates are consistent with 
experimental findings. It is argued in the 
present paper that even though the neural 
enhancement position cannot be tested in 

# detail until more is known about tuning 
" characteristics in sensory systems and the 
av cells respond to changes in stimulus value, 
in its present form it offers a more satisfactory 
explanation of negative aftereffects than 
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normalization theory does. There is little 
evidence to support the latter claim that 
negative aftereffects occur as by-products of a 
perceptual adaptation process. It is further 
Proposed that what Gibson has referred to as 
adaptation has the same neural correlates as 
negative aftereffects such that the two can be 
differentiated solely by reference to experi- 
mental procedures. Attention is also given 
to the relationship between negative after- 
effects described by Gibson (1937b) and 
figural aftereffects later reported by Köhler 
and Wallach (1944). It is suggested that both 
types of aftereffects occur because abrupt 
stimulus changes result in exaggerated modi- 
fication in neural response before a level of 
neural activity appropriate to the new value 
of stimulation is attained, but that negative 
aftereffects are produced within modalities in 
which stimuli are signaled via broadly tuned 
neural analyzers, and figural aftereffects occur 
in cases where stimulus properties are coded 
topographically. 


ASSESSMENT OF NORMALIZATION THEORY 


Gibson (1933) claimed that curvature after- 
effects occur because the experience evoked 
by each stimulus value belonging to the con- 
cave-straight-convex dimension is modified 
for a period of time after inspection to the 
extent that experience neutralized during 
exposure to curved inspection lines. The 
following statement summarizes Gibson’s posi- 
tion: 


If a sensory process which has an opposite is made to 
persist by a constant application of its stimulus- 
conditions, the quality will diminish in the direction 
of becoming neutral, and therewith the quality evoked 
by any stimulus for the dimension in question will be 
shifted temporarily toward the opposite or comple- 
mentary quality . . . Negative aftereffect is a by- 
product or incident of the primary fact of adaptation 
to a norm [Gibson, 1937b, p. 223, p. 241]. 


Gibson's claim that negative aftereffects result 
from adaptation and reflect the dimensional 
properties of the inspection and test stimuli 
allows normalization theory to be tested in 
several ways. Experimental data bearing on 
the theory are now considered. 
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Angular Function of Tilt Aflereffects 


A vertical line looks slightly tilted when 
judged following prolonged inspection of tilted 
lines (Gibson & Radner, 1937). The direction 
and magnitude of the aftereffect is dependent 
on the orientation of the inspection figure; a 
vertical test line appears displaced in a counter- 
clockwise direction after exposure to lines 
tilted less than 45 degrees in a clockwise 
direction, and inspection of lines tilted between 
45 degrees and 90 degrees results in clockwise 
displacement of the test line. Judgments of the 
test stimulus are not affected by exposure to an 
inspection figure tilted 0, 45, or 90 degrees. 
In explaining these results, Gibson (1937a, 
1937b) proposed that the vertical and hori- 
zontal spatial directions constitute neutral 
points of separate (but linked) oppositional 
series which function such that progressive 
perceptual adaptation occurs lowards the 
nearest norm during inspection of a tilted line. 
He further assumed that negative aftereffect 
is produced through normalization. In terms 
of this analysis no aftereffect occurs when the 
inspection figure is at either norm or at 45 
degrees; the latter tilt is midway between 
the two norms and does not adapt more to 
one norm than the other. A line tilted less than 
45 degrees from the vertical adapts to the 
vertical during inspection, and a line tilted 
more than 45 degrees adapts to the horizontal; 
the vertical test figure is thus repelled from the 
inspection figure in the former case and 
attracted in the latter. Normalization theory 
does not, however, explain why repulsion 
aftereffects (called “direct” effects by Gibson) 
are larger than attraction aftereffects (“in- 
direct” effects). 

Tilt aftereffects occur with kinesthetic as 
well as visual judgment; a horizontal rod feels 
slightly tilted following kinesthetic inspection 
of a tilted rod. This negative aftereffect. was 
first reported by Gibson (1937b), who pro- 
posed that similar normalization processes 
underlie visual and kinesthetic aftereffects. 
Phe angular function of kinesthetic tilt after- 
effect (Over, 1967a; Singer, Flanagan, & 
Collins, 1968) is different, however, from that 
found with visual aftereffect (Gibson & 
Radner, 1937; Morant & Harris, 1965). Both 


f 
direct and indirect effects are found when | 


; 
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kinesthetic test judgments are made to the 
vertical, but direct effects alone are obtained 
with judgment of the horizontal. As inspection 
of a horizontal or vertical line does not produce 
aftereffects in either modality, it is unlikely 
that visual and kinesthetic space have different 
norms. Entirely different normalization proc- 
esses must therefore occur in the two senses 
if Gibson's analvsis is to be applied con- 
sistently; a figure tilted 30 degrees from the 
vertical must adapt to the vertical during 
visual inspection, but to the horizontal—and 
lo a greater degree than a line tilted only 15 


degrees from the horizontal—during kin- 
esthetic inspection. The more reasonable 


conclusion is that the magnitude of a tilt 
aftereffect reflects the properties of the sensory 
system involved in inspection and subsequent 
test judgments, and is not dependent simply 
on the location of the inspection and test 
contours relative to the norms of the series 


to which they belong. 


Development and Decay Functions of Negative 
Aftereffects 


Gibson’s claim that negative aftereffects 
occur as consequences of normalization arose 
from an experiment (Gibson, 1933) which 
demonstrated that the amount by which a 
straight line appears curved immediately after 
10-second inspection of curved lines equals 
the reduction which has taken place in the 
apparent curvature of the inspected lines over 
the viewing period. Although adaptation and 
negative aftereffect have not been compared 
in this way in any later experiments, there is 
ample evidence that aftereffects can occur m 
the absence of normalization. 

Extended inspection of a physically stable 
stimulus produces progressive perceptual adap- 
tation (e.g, Goldstein, 1957; Hochberg, 
Triebel, & Seaman, 1951; Kenshalo & Scott, 
1966; Taylor, 1963b). The magnitude of an 
aftereffect should therefore be a function of 
the duration of the inspection interval if 
Negative aftereffects occur solely as the con- 
Sequence of normalization, and the degree to 
Which normalization is complete is dependent 
On the period of time the subject is exposed 
to the inspection stimulus. Gibson and Radner 
(1937) found larger visual tilt aftereffects 
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as inspection was lengthened from 1 second 
to 45 seconds; more prolonged inspection 
did not increase the magnitude of the after- 
effect. Similar development functions were 
obtained with figural aftereffects (Hammer, 
1949) and movement aftereffects (Taylor, 
1963a). 

However, in measuring development func- 
tions the above experimenters made no al- 
lowance for decay in aftereffect occurring in 
the interval between cessation of inspection 
and completion of the postinspection judgment 
(Ikeda & Obonai, 1953; Oyama, 1953). This 
period is referred to as the judgment-time 
interval even though it also involves the time 
taken by the experimenter to remove the 
inspection figure and display the test stimulus. 
Day, Burns, Singer, Holmes, and Letcher 
(1967), in a study employing 403 subjects, 
found the mean judgment-time interval for 
visual aftereffect was 9.57 seconds, and for 
kinesthetic aftereffect 13.93 seconds. As after- 
effects decay exponentially following inspection 
at a rate dependent on the duration of the 
inspection period (Hammer, 1949; Ikeda & 
Obonai, 1953), at least part of the development 
function found under the conditions normally 
employed to measure aftereffects occurs be- 
cause most of the aftereffect produced by brief 
inspection has decayed, relative to the after- 
effect resulting from prolonged inspection, 
by the time the postinspection judgment is 
complete. In support of their claim that de- 
velopment functions for aftereffect are largely, 
if not totally, artifactual Ikeda and Obonai 
(1953) and Oyama (1953) demonstrated that 
the aftereffect generated by 1-second inspection 
is of the same magnitude as the aftereffect 
found following an inspection period of 240 
seconds if postinspection judgments are com- 
pleted immediately. 


Direction of Aftereffect Relative lo Normalization 


Figures moving in one direction ata constant 
speed appear to slow down during prolonged 
inspection (Goldstein, 1957; Taylor, 1963b). 
Normalization theory implies not only that a 
stationary test figure will appear to move in 
the direction opposite to the inspection figure, 
also that the apparent speed of a moving 
st stimulus will be temporarily decreased 


but 
te: 
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if inspection and test movements are in the 
same direction and increased if the two stimuli 
move in opposite directions. Scott, Jordan, 
and Powell (1963) found that movement 
aftereffect adds to objective motion when the 
inspection and test movements are in opposite 
directions. Carlson (1962) and Rapoport 
(1964), however, did not find any evidence of 
additivity under these conditions; both experi- 
menters also reported that the apparent speed 
of a test figure moving in the same direction 
as the inspection stimulus was either un- 
changed or increased if its velocity was greater 
than that of the inspection stimulus. The 
perceptual shift in this latter case is in the 
direction opposite to that in which normaliza- 
tion occurs during inspection. 


Aflereffects of Adaptation to Transformed Vision 


Gibson (1959b) has claimed that the normal- 
ization process responsible for negative after- 
effect also underlies errors made in visually 
guided behavior after prolonged exposure to 
spatially transformed visual stimulation (e.g., 
I. Kohler, 1964). He earlier demonstrated 
(Gibson, 1933) that inspection of straight 
lines through prisms that produce visual 
curvature generates aftereffects, which are of 
the same magnitude as aftereffects resulting 
from inspection of curved lines viewed without 
optical transformation, when straightness is 
later judged with the prisms removed. This 
relationship holds, however, only if the subject 
does not make active visually controlled move- 
ments during inspection ; the transformed input 
condition yields much larger aftereffects when 
active movement is permitted (Mikaelian & 
Held, 1964). Several other methods have been 
used (see Rock, 1966) to demonstrate that two 
different types of aftereffect exist. In some 
experiments (Day & Singer, 1967; Over, 
1967b) conditions were arranged so that the 
aftereffects occurred in opposite rather than 
similar directions; this method has shown that 
the two aftereffects decay at different rates 
following cessation of inspection. 

Day and Singer (1967) have shown that 
aftereffects induced by inspection of trans- 
formed stimulation resemble motor learning 
and transfer effects. They used the term 
"behavioral compensation" to refer to per- 
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ception under these conditions, and described 
aftereffects resulting from inspection of non- 
transformed stimulation as “sensory” because 
of their decay characteristics (see Ganz, 
1966a). Day and Singer further showed that 
many experimenters have failed to distinguish 
the two types of aftereffect when interpreting 
data. It is sufficient to note here that the 
adequacy of an explanation of negative after- 
effects should not be assessed by determining 
whether the explanation can also be applied to 
errors which occur in visually guided behavior 
following exposure to spatially transformed 
visual input. 


Aftereffects and Illusions 


Normalization theory was developed to 
explain why perceptual error occurs when one 
stimulus is judged after inspection of another 
stimulus belonging to the same continuum. | 
Perceptual error also occurs when two stimuli 
belonging to the same continuum are presented 
at the one time; the distortions found in the | 
latter case are usually referred to as illusions — 
rather than aftereffects. Illusions which are 
either extremely similar to or identical with 
negative aftereffects have been reported for 
the perception of tilt of line (Gibson, 1937a), 
movement (Brown, 1931), skin temperature 
(Békésy, 1962), color (Akita, Graham, & 
Hsia, 1964), and brightness (Ratliff, 1965). 
In the case of tilt of line, Gibson (1937a) found 
that the amount by which a vertical line looks 
tilted when viewed together with a tilted line 
follows the same angular function as visual 
tilt aftereffect (Gibson & Radner, 1937). 
This close similarity makes it unlikely that 
the illusion and the aftereffect have different 
determinants. Gibson has claimed that nega- 
tive aftereffects result from a normalization 
process which develops over time. This analysis 
cannot be applied to illusions because per- 
ceptual error occurs immediately as the figure 
is displayed and diminishes, rather than 
intensifies, as the interval is increased for 
which the figure is shown (Piaget, 1961). 


Assessment 

Several characteristics of negative after- 
effects cannot be predicted solely from knowl- 
edge of the location of the inspection and post- 
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inspection stimuli relative to the norms of 
the oppositional series to which they belong. 
In addition, adaptation and negative after- 
effect have different temporal determinants 
such that aftereffects are found at full strength 
under conditions where adaptation has either 
not taken place or is incomplete. Gibson’s 
basic assumption that negative aftereffects 
occur as by-products of an adaptation process 
is therefore incorrect. It does not necessarily 
follow that an adequate explanation of negative 
aftereffect cannot be given within a psycho- 
physical framework. In the remainder of the 
present paper, however, attention is given to 
theories which attribute perceptual distortions 
found in the aftereffect situation to operations 
by which information about stimulus proper- 
ties is signaled and classified within sensory 
systems. Relationships are examined between 
aftereffect data and information available 
about feature analysis from electrophysio- 
logical measurement and contour masking 
studies. It is argued that even though neural 
enhancement explanations of negative after- 
effect cannot be tested in detail until more is 
known about tuning characteristics in sensory 
systems and the way cells respond to abrupt 
changes in stimulus value, in their present 
form they offer a more satisfactory explanation 
of negative aftereffect than normalization 


theory does. 


RELATIONSHIP BETWEEN NEGATIVE AFTER- 
EFFECTS AND FIGURAL AFTEREFFECTS 


Kóhler and Wallach (1944) reported that 


prolonged fixation of contours modifies the 


apparent size and location of figures sub- 
sequently seen in the same part of the visual 
field. They found, for example, that a circle 
looks smaller when seen after inspection of a 
larger circle, and that two contours are mIs- 
aligned following fixation of a figure 1m the 
Same part of the visual field as one of the 
contours. These distortions were described 
as figural aftereffects. Similar aftereffects were 
later reported for kinesthesis (Kohler & 
Dinnerstein, 1947) and audition (Deutsch, 
1951), 

Köhler and Wallach contended that figural 
aftereffects cannot be dependent on normal- 
lation because the size continuum does not 
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have oppositional properties or a neutral 
point. They proposed instead that figural 
aftereffects occur because prolonged fixation 
of a figure results in localized resistance to 
current flow in the primary projection area 
of the cortex such that input from contours 
subsequently presented in the same part of the 
visual field is displaced into nonsatiated re- 
gions. They also attributed negative after- 
effects to satiation following an experiment 
which compared the visual tilt aftereffect 
found with a tilted inspection figure and ver- 
tical test stimulus with measures obtained 
when the inspection figure was vertical and 
the test figure tilted. From Gibson's analysis 
the inspection figure is at the norm in the latter 
case, and no aftereffect should occur. Kóhler 
and Wallach (1944), however, obtained tilt 
aftereffects of the same magnitude with the 
two arrangements. Although Templeton, How- 
ard, and Easting (1965) have subsequently 
shown that inspection of a vertical line results 
in a much smaller afterefiect than inspection 
of a tilted line, the occurrence of an aftereffect 
at all with the former arrangement is never- 
theless contrary to normalization theory. 
Although Gibson (19592) agreed that neural 
processes of the type proposed by Kohler 
and Wallach underlie figural aftereffects, he 
maintained that tilt aftereffects reflect the 
oppositional nature of the tilt continuum and 
arise through normalization. One difficulty 
with this position is that similar development 
and decay functions are found for tilt and size 
aftereffects (Ikeda & Obonai, 1953), and in 
fact for negative aftereffects and figural 
aftereffects in general (Taylor, 1962). The 
two types of aftereffect differ, however, in 
one important respect: visual tilt aftereffects 
occur equaly with fixation. and with free 
movement of the eyes across the inspection 
figure (Gibson & Radner, 1937; Held, 1962), 
but figural aftereffects are reduced in magni- 
tude if subjects fail to maintain constant 
fixation (Ganz, 1966b). This suggests that 
figural aftereffects are dependent on processes 
restricted to parts of the visual field stimu- 
lated during inspection, and negative after- 


effects are not. . : 
In examining this question, Morant 
0) found a mean tilt aftereffect. 


Mikaelian (196 
of 1.09 degrees when the inspection and test 


and 


col urs differed bv 13 degrees and 
located on opposite sides of the vertical merid- 
ian of the retina (interfield aftereffect) com- 
pared with 1.52 degrees when the two figures 
coincided spatially (intrafield | aftereffect). 
Interfield transfer indicates that tilt after- 
effects cannot be wholly dependent on topo- 
graphically restricted processes in the visual 
system because the receptive fields of cells 
sensitive to contour orientation always ter- 
minate abruptly at the vertical meridian 
(Hubel & Wiesel, 1965). Morant and Harris 
(1965) subsequently proposed that two proc- 
esses are responsible for intrafield tilt after- 
effects. Their claim was based on measures of 
the angular function of aftereffect obtained 
when subjects inspected a tilted line located 
7 degrees to one side of the fixation point 
before setting a line (Tı), presented on the 
opposite side of the vertical meridian, parallel 
to an objectively vertical line (Ts) shown 
where the inspection figure had been dis- 
played. Contour repulsion was found with 
each angle of tilt of the inspection figure and 
was maximal for tilts around the vertical. 
In explaining this result, Morant and Harris 
assumed that normalization influenced the 
perception of T, and Ty equally; the after- 
effect thus reflected the process (satiation) 
confined to the part of the visual field stimu- 
lated during inspection. They further assumed 
that normalization, if operating alone, would 
produce an S-shaped angular function with 
contour repulsion and attraction effects of 
equal magnitude; the asymmetrical S-shaped 
angular function which is found for intrafield 
aftereffect was attributed to algebraic summa- 
tion of satiation and normalization effects, 
Muir and Over (1970) have questioned this 
interpretation. They found no evidence of 
interfield transfer of aftereffect when, by the 
argument of Morant and Harris (1965), the 
influence of the satiation process was factored 
out by presenting T; as the inspection figure 
and subsequently requiring subjects to set T», 
presented without T;, to the apparent vertical. 
They also pointed out that the procedure used 


were 
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by Morant and Harris confounded intrafield 
and interfield differences with differences | 
between central and peripheral vision. The 
experiment reported by Muir and Over showed 
that the angular function found by Morant 
and Harris when subjects made test judgments 
by setting Tı and T, parallel is identical to 
the angular function obtained for intrafield 
aftereffect when the inspection stimulus and 
the test figure (a single line which the subject 
sets to the vertical) are seen in peripheral 
vision. Reasons why different 2 angular functions 
are obtained for tilt aftereffect in central and 
peripheral vision are considered later. 

If tilt aftereffects are topographically re- 
stricted to the part of the visual field stimu- 
lated by the inspection figure, are they gen- 
erated by the same error-inducing processes 
as figural aftereffects? The satiation theo 
(Kohler & Wallach, 1944) and the statistical 
theory (Osgood & Heyer, 1952) need not be 
considered in this connection as neither pro- 
vides an adequate explanation of figural 
aftereffects (see Deutsch, 1956; Ganz, 1966b). 
Attention is instead directed to an analysis 
of figural aftereffect in terms of lateral inhibi- 
tory interaction given by Ganz (1966b). He 
proposed that the location of a contour is 
signaled in the visual system by the mean of a 
spatially distributed ridge of excitation, but 
that excitation is proactively suppressed if 
another figure has recently been viewed in 
the same part of the visual field. Because the 
degree to which inhibition occurs is dependent 
on the spatial separation of the two lines, the 
distribution generated by the second contour 
is skewed, and its mean is shifted. Ganz has 
applied this analysis to errors made when 
subjects attempt to align two contours after 
having viewed a figure near where one of the 
lines was displayed. The mechanism Ganz 
proposed to explain figural aftereffects does 
not, however, indicate why tilt displacements; 
as omine to lateral shifts i in contour location, 
occur following inspection of a tilted line. 
If the extent to which parts of a line are 
shifted through inhibition is dependent on 


and 
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intercontour distance, a bowed line distortion 
would occur rather than a change in the tilt 
of the total line (Sutherland, 1961). The 
further possibility is that lateral inhibitory 
interaction underlies both negative aftereffects 
and figural aftereffects, and that different 
neural analyzers are involved in signaling 
information in the two cases. 


NEURAL ENHANCEMENT IN SENSORY SYSTEMS 
EMPLOYING TOPOGRAPHIC AND NON- 
TOPOGRAPHIC CODING PRINCIPLES 


The proposal now considered is that nega- 
tive aftereffects as well as figural aftereffects 
occur because abrupt stimulus changes result 
in exaggerated modification in neural response 
before a level of neural activity appropriate 
to the new value of stimulation is attained. 
It is suggested, however, that negative after- 
effects are produced only within modalities 
in which stimulation is classified by broadly 
tuned neural analyzers, while figural after- 
effects occur in cases where stimulus properties 
are coded topographically. As the account of 
figural aftereffects given by Ganz (1966b) 
is consistent with the approach being adopted, 
attention is mainly given to negative after- 
effects. 

Erickson (1968) has distinguished two cod- 
ing principles employed by afferent systems 
in transmitting information. Some stimulus 
properties (e.g. the location of a figure in the 
feld of view, or on the surface of the skin) 
are represented topographically in the nervous 
system; particular values of each dimension 
are signaled in terms of which of a spatially 
arranged set of neurons are excited. In such 
systems each neuron is responsive to à re- 


stricted range of the stimulus dimension, and 


the excitatory ranges of adjacent neurons 
In other 


overlap to only a limited degree. 
modalities, however, neurons are broadly 
tuned such that each single unit, 
charge frequency, can signal most và 
Within the dimension. Continua for which 
this coding principle applies include wave- 
length (De Valois, Jacobs, & Jones, 1963), 


by its dis- 
lues 
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luminous intensity (Jacobs, 1965), visual 
tilt of line (Hubel & Wiesel, 1968), kinesthetic 
tilt of line (Mountcastle, Poggio, & Werner, 
1963), and skin temperature (Hensel, Iggo, 
& Witt, 1960). 

Modalities which employ nontopographic 
coding principles have an opponent process 
organization. Although each neuron responds 
over much of the stimulus dimension to which 
it is sensitive, at one extreme of the dimension 
some neurons are highly active and others 
relatively silent, and at the other extreme 
the reverse relationship holds. Specific ex- 
amples of oppositional functioning are con- 
sidered later. At present it can be noted in 
systems organized in this way that different 
stimulus values are signaled by different 
pallerns of activity within the total set of 
neurons responsive to the dimension in ques- 
tion, rather than in terms of which neurons are 
excited. 

Sensory systems enhance information about 
stimulus contrast at the expense of information 
about absolute properties of stimulation. 
Simultaneous contrast eflects have received 
more attention in electrophysiology experi- 
ments than successive contrast effects (see, 
for example, Ratliff, 1965). It has been shown, 
however, in several modalities that a sudden 
change in stimulus value leads to a state such 
that sensory input, for a short period of time, 
reflects the modification that has occurred in 
stimulation rather than simply the value of the 
introduced stimulus (e.g, Barlow & 

& Boman, 1960; Mount- 
1959; Ratliff, Hartline, & 
Miller, 1963). Neurons which become more 
excited as a consequence of the change in 
stimulus value overshoot the activity level 
appropriate to the new stimulus and do not 
adapt to this discharge rate until some time 
later. A shift in stimulus value in the opposite 
direction produces 4n undershoot in neural 
activity, and in some cases complete suppres- 
sion, before recovery Occurs. ] 

Sensory systems enhance information about 
mulus change only to the extent that the 


newly 
Hill, 1963; Hensel 
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same neural units are involved in signaling 
the preshift and postshift stimulus conditions. 
In topographic systems entirely different sets 
of neurons are excited unless the temporally 
separated stimuli are close together spatially; 
exposure to the preshift stimulus has no effect 
on input generated by the postshift stimulus 
when they are not. Deutsch (1964) and Ganz 
(1966b), in the accounts thev offered of 
figural aftereffects, have discussed interactions 
that occur under the former conditions. In 
nontopographic systems with broadly tuned 
neurons and oppositional organization, changes 
in stimulus value over much of the relevant 
dimension produce an overshoot in the re- 
sponse of some neurons and an exaggerated 
reduction in the response of other cells. For a 
period of time after the stimulus has been 
changed, the pattern of activity within the 
total set of neurons will be similar to the stable 
discharge pattern characteristic of a main- 
tained stimulus value which differs more from 
the preshift stimulus value than the postshift 
stimulus does. In proposing that negative 
aftereffects occur through enhancement of 
differences in the response of opponent- 
functioning neural units, it is necessary to 
assume that the brain, in interpreting the 
input generated by the postshift stimulus, 
operates in accord with a single and simple 
rule; namely, transient changes in response are 
not recognized as such by the input-classifying 
process, but are instead treated at all times 
as though they represent the absolute proper- 
ties of the postshift stimulus. 

Several difficulties confront attempts to 
provide a detailed analysis of negative after- 
effects within this framework. First, informa- 
tion available about sensory processing from 
single-unit electrophysiological measurement 
has, with few exceptions (e.g., Hensel & 
Boman, 1960), been obtained using infra- 
human subjects. Second, even if it is assumed 
that similar feature analysis occurs within 
human and infrahuman sensory systems, 
electrophysiological studies have, to date, 
yielded relatively little information about the 
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influence of stimulation on the pattern of 
activity within the population of units sensitive 
to the dimension in question. Synthetic popu- 
lations can, however, be assembled by pooling 
measures obtained from different animals. 
Mountcastle et al (1963), for example, re- 
corded from hinge-joint neurons in 14 macaque 
monkeys and normalized measures by con- 
verting the excitatory angle of each cell into 
a percentage of the total possible angular 
movement of the joint and by converting the 
discharge frequency recorded at different 
joint positions as a percentage of the maximum 
activity level of the cell. Their analysis indi- 
cated that the kinesthetic system possesses 
an opponent-process organization; a larger | 
sample would be needed, however, to establish 
the distribution, breadth of tuning, and degree 
of overlap of units with different peak sensi- 
tivities. Third, there have been very few 
parametric studies (e.g., Barlow & Hill, 1963; 
Starr, 1965) of neural activity under stimulus 
conditions which generate negative aftereffects. 
It is unfortunate for psychologists concerned 
with establishing the neural correlates of 
negative aftereffects that physiologists have 
concentrated on study of relationships be- 
tween the stable firing rates of neurons and 
different values of maintained stimulation. 
The degree to which neural response is exag- 
gerated following a change in stimulus value 
is dependent on the magnitude and abruptness 
of the shift (Hensel & Boman, 1960; Mount- 
castle et al., 1963), but it has not yet been 
established whether the period of exposure 
to the preshift stimulus has any effect, Nor 
has attention been given to factors which affect 
recovery from the postshift transitory re- 
sponse level. 

Weisstein (1969) has recently argued that 
sophisticated use of the masking methodology 
in behavioral experiments can provide in- 
formation about stimulus processing in the 
human visual system analogous to that which 
would result from single-unit electrophysio- 
logical measurement. Masking experiments 
measure the extent to which the detectability 
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of one stimulus is modified by exposure to 
another. A variety of masking paradigms is 
available, and Weisstein has discussed the 
inferences which can be drawn from measures 
obtained using particular designs? To date, 
the masking methodology has mainlv been 
used to establish whether or not particular 
stimulus properties are processed in the visual 
system by specialized cells. Little information 
has been gained about the temporal properties 
of contour masking or population values 
within different analyzers. It is difficult to 
test neural enhancement explanations of par- 
ticular negative aftereffects without this tvpe 
of information. As a consequence, the basis of 
assessment adopted in the following section 
is relatively limited; it is asked whether the 
explanation. under consideration successfully 
differentiates stimulus conditions which influ- 
ence the magnitude of aftereffect from those 


igm is similar 


?The traditional aftereffect pa 
in many respects to the cross-adaptation forward 
masking design; the difference is that in the latter case 
the influence of the inducing stimulus on the threshold 
for detection of the target is measured while in the 
former case an observer is required to judge a single 
property of the suprathreshold target following ex- 
posure to the inducing stimulus. We ein has argued 
that characteristics of specific feature a alyzers can be 
inferred only from masking data, claiming that after- 
effects reflect the pooled response of a number of dif- 
ferent types of analyzers. It is known (c.g, Ikeda & 
Obonai, 1953; Singer & Day, 1965), however, that the 
initial magnitude of an aftereffect is independent of the 
period of time subjects are exposed to the inspection 
stimulus; in addition, decay in aftereffect is continuous 
following cessation of inspection. These data suggest 
that aftereffects measure the influence inspection has 
on the response of a specific class of feature analyzer. 
The inferential steps involved in establishing. the 
tuning characteristics of single units from aftereffect 
measures which reflect the activity of many units are 
exactly the same as those when masking data are 
involved. The paradigms proposed by Weisstein 
(1969) for the latter purpose can be paralleled in the 
design of aftereffect experiments. Forward stimulus 
presentation has been employed almost exclusively 
to generate aftereffects even though related distortions 
occur with the test stimulus displayed before the 
inspection stimulus (Obonai, 1957; Taylor, 1962). 
Th addition, aftereffects probably occur with juxta- 
Position as well as superimposition of the inspection 
and test stimuli. The measurement of aftereffects using 
Metacontrast-type and paracontrast-Ltype paradigms 
Sin possibly contribute as much to the study of feature 
analysis as masking experiments, and should receive 


r : 
More attention. 
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which do not. Deutsch (1964) and Ganz 
(19662) have examined relationships between 
the temporal characteristics of figural after- 
effects and neural inhibition in some detail. 
As temporal variables have similar effects on 
negative aftereffects and figural aftereffects 
(Taylor, 1962), limited reference is made to 
these variables in the following discussion. 


NEURAL CORRELATES OF NEGATIVE AFTER- 
EFFECTS OF TILT AND MOVEMENT 


Negative aftereffects occur following sudden 
variation in kinesthetic line tilt (Gibson, 
1937b), visual line tilt (Gibson & Radner, 
1937), movement (Wohlgemuth, 1911), skin 
temperature (Kenshalo, Nafe, & Brooks, 
1961), luminous intensity (Wright, 1934), 
light intermittency (Vega, Costiloe, & Par- 
sons, 1968), sound intensity (see Starr, 1965), 
sound frequency (Christman & Williams, 
1963), sound location (Krauskopf, 1954), 
angular acceleration (Clark & Stewart, 1968), 
cutaneous vibration (Gescheider & Wright, 
1968), body position (Day & Wade, 1969), 
and probably many other stimulus properties. 
The present section examines evidence bearing 
on whether neural operations which enhance 
stimulus contrast underlie kinesthetic tilt, 
visual tilt, and movement aftereffects. These 
aftereffects have been selected for detailed 
treatment because in each case the negative 
aftereflect has been extensively studied, and 
relevant data on sensory functioning are 
available. Although other negative aftereffects 
are given little attention in the present paper 
it should be noted that opponent-process 
systems are also involved in signaling informa- 
tion about luminous intensity (Jacobs, 1965), 
wavelength (De Valois et al., 1963), and skin 
temperature (Hensel et al, 1960). In each 
case an abrupt change in stimulus value results 
in neural enhancement of the type described 


earlier. 


Kinesthetic Tilt Aftereffects 


A horizontal rod feels slightly tilted follow- 
ing kinesthetic inspection. of E tilted rod 
(Gibson, 1937b). The magnitude of the 
kinesthetic tilt aftereffect is dependent on the 


angle at which the rod was tilted during in- 
Spection (Over, 1967a; Singer et al., 1968). 
It was shown earlier that normalization theory 
does not explain why the angular function 
for kinesthetic tilt aftereffect takes the form 
it does. The possibility considered now is that 
kinesthetic tilt aftereffects occur through 
inhibitory interaction between units in the 
kinesthetic sensory system. 

The muscle stretch receptors play little part 
in judgment of the spatial Position of a limb; 
this discrimination is instead mediated by 
input from receptors located in the joint 
capsules, ligaments, and about the tendon 
grooves (see Mountcastle & Powell, 1959; 
Rose & Mountcastle, 1959). Thalamic cells 
responsive to the position of the knee joint 
of the macaque monkey have large excitatory 
angles and are single ended in their discharge 
patterns (Mountcastle et al. 1963). Outside 
its excitatory angle, a cell fires spontaneously, 
and within the angle its discharge frequency 
is monotonically related to the position of the 
joint. Most cells are maximally active at an 
extreme joint position, and the excitatory 
angles of a set of thalamic cells related to a 
particular joint commonly overlap so that 
extensor and flexor neurons both discharge 
at a low rate when the joint is at or near the 
midposition. In addition, adjacent cells are 
often reciprocally related to a given joint, 
with flexion increasing the activity of one 
cell and decreasing the discharge of its neigh- 
bor. The opposite relationship is found with 
extension (Mountcastle, 1957). 

In kinesthetic tilt aftereffect experiments 
the subject judges a midposition (e.g., hori- 
zontal) after the arm has been maintained 
away from this position for a period of time. 
Mountcastle et al. (1963) have obtained 
electrophysiological measures in an equivalent 
stimulus situation. They recorded the activity 
of a thalamic cell driven by extension of the 
contralateral knee of the macaque monkey 
with the joint moved from a position where 
discharge was low (the midposition), to an 
excited position (partial extension), and then 
back to the midposition. Four different angles 
of extension were studied. In cach case, move- 
excitatory angle produced an 
discharge frequency before a 
dependent on the maintained 


ment into the 
overshoot in 


steady rate, 
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position of the limb, was reached. This rate 
remained unchanged as long as the limb was 
kept in the same position. Restoration of the 
limb to the midposition resulted in suppression 
of cell activity. Both the degree of suppression 
and the time required for recovery to the 


Spontaneous discharge level were dependent 


on the angle of extension of the limb, and 
hence on the activity level of the cell, in the 
preceding period of stimulation. 

An explanation of kinesthetic tilt aftereffects 
can be given if it is assumed that discrimina- 
tion of the spatial Position of a limb is medi- 
ated by the relative response rates of a set of 
neurons with the Properties described above. 
Prolonged extension of the limb would result 
in suppression of cells sensitive to extension 
when the limb is Subsequently returned to the 
midposition, but would not affect the activity 
of cells sensitive to flexion. The balance 
normally found in the response of opponent- 
functioning units will now be disturbed, 
and the higher firing rate of flexion cells will 
result in judgment that the limb is located 
in a flexion position, The magnitude of the 
perceptual error would be directly related to 
the prior angular displacement of the limb 
from the midposition up to the limits of the 
excitatory cells involved; this is because the 
degree to which Suppression occurs is de- 
pendent on the cell’s firing rate, Over (1967a) 
has offered an explanation of the angular 
function of kinesthetic tilt aftereffect in these 
terms. 

Several tests can be made of this analysis 
of kinesthetic tilt aftereffects. Equivalent 
afterefiects should be produced by active and 
Passive inspection of a tilted rod if kinesthetic 
tilt aftereffects are dependent on operations 
occurring within the kinesthetic sensory sys- 
tem rather than the muscle stretch system. 
Zacks and Freedman (1963) obtained larger 
aftereffects with active movement, but Day 
and Singer (1964), who directly monitored 
muscular involvement and controlled for the 
influence of judgment time on aftereffect, 
found identical kinesthetic tilt aftereffects 
with active and Passive inspection. In addition, 
kinesthetic tilt aftereffects occur with the arm 
maintained in a fixed position relative to the 
horizontal during inspection. (Collins, 1967; 


Thurner, 1961); this suggests that active 
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movement of the limb during inspection is not 
& necessary condition for aftereffect. Absence 
of intermanual transfer of kinesthetic tilt 
aftereffects (Singer & Day, 1966) also indi- 
cates that these aftereffects are dependent 
on processes occurring within the kinesthetic 
Sensory system; this is because thalamic 
cells responsive to the spatial position of a 
joint are driven by stimulation of the con- 
tralateral side of the body alone (Rose & 
Mountcastle, 1959). 


Visual Tilt Aftereffects 


Day (1962, 1969) proposed that visual tilt 
aftereffects occur because some cells sensitive 
to contour orientation are suppressed when 
a line is displayed at one tilt after having been 
viewed for a period of time at another tilt, 
but the response of other cells is unaffected. 
The discharge pattern generated when the 
postinspection stimulus is first presented is 
therefore that normally signaled by a line at a 
slightly different orientation. The present 
section examines neurophysiological and psy- 
chophysical studies which provide information 
about feature analysis in the visual system 
relevant to this explanation of visual tilt 
aftereffects. 

Manv cells in the visual cortex of the cat 
(Campbell, Cleland, Cooper, & Enroth-Cugell, 
1968; Hubel & Wiesel, 1965) and the monkey 
(Hubel & Wiesel, 1968) are relatively un- 
responsive to diffuse light, but are highly 
tuned to contour orientation. Some of these 
cells have elliptical receptive fields with either 
excitatory centers and inhibitory flanks, or 
inhibitory centers and excitatory surrounds. 
Stimulation of an inhibitory region by light 
reduces discharge frequency below the rate 
at which the cell fires in darkness, and stimu- 
lation of an excitatory region increases the 
cell’s firing rate above its spontaneous activity 
level. Each cell fires maximally to a particular 
contour orientation, but is nevertheless senst- 
tive to much of the tilt continuum. The cell 
is excited within a limited range (Campbell 
et al., 1968, have reported an average value of 
24 degrees) on each side of the preferred 
direction, and is inhibited, relative to its 
SPontaneous firing rate in darkness and the 
"éSbonse levels found at intermediate tilt 
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positions, when the stimulating contour is 
tilted 90 degrees from the preferred direction. 
Cells are aggregated into columns in terms of 
their topographic properties (location in 
visual fields, eve dominance) as well as their 
nontopographic characteristics (preferred di- 
rection). 

Contour masking experiments have indi- 
cated there are cells in the visual system which 
function in a similar way. In several recent 
studies, subjects were required to detect either 
a single line or a grating (the target stimulus) 
presented briefly either before (backward 
masking—Parlee, 1969; Sekuler, 1965), to- 
gether with (simultaneous masking—Camp- 
bell & Kulikowski, 1966), or after (forward 
masking—Gilinsky & Doherty, 1969; Houlihan 
& Sekuler, 1968) exposure to another line or a 
grating (the masking stimulus) presented in 
the same part of the visual field. In each 
experiment the detectability of the target 
was measured as a function of the difference 
in orientation of the two stimuli. Similar 
angular functions for masking were found with 
backward, simultaneous, and forward presen- 
tation, and in each case detection was most 
impaired when the masking and target stimuli 
were presented at the same orientation and 
improved as their tilts differed. Exposure 
to the masking stimulus had no effect on judg- 
ment of the target stimulus when the two 
figures differed by 45 degrees. 

Experimenters to date have given little 
attention to development and decay functions 
for the masking of tilted lines. Houlihan and 
Sekuler (1968) examined the extent to which 
forward masking is dependent on the interval 
for which the masking figure is displayed, 
but used only three durations (5, 50, 500 
milliseconds). The sole decay data available 
have come from a backward masking experi- 
ment (Sekuler, 1965) in which masking was 
measured as the interval between offset 
of the target stimulus, and onset of the mask- 
ing figure was varied. The limited interest so 
far shown in the temporal determinants of 
contour masking is disappointing in view of 
the extensive data now available on develop- 
ment and decay functions for aftereffects 
(see Ganz, 1966a). 

Most studies of contour masi ing have used 


vertical and horizontal lines as target stimuli; 
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one exception is the experiment by Campbell 
and Kulikowski (1966) in which measures 
were obtained with a 45-degree target figure. 
A range of target values should be used in 
experiments to establish whether there are 
more neural analyzers maximally responsive 
to particular orientations than others, and 
whether analyzers with different preferred 
directions have the same breadth of tuning. 
Cells with either vertical or horizontal pre- 
ferred directions do not occur more frequently 
than cells with other preferred directions in the 
cat and monkey visual systems (Campbell 
et al, 1968; Hubel & Wiesel, 1965, 1968). 
With human subjects, however, vertical and 
horizontal lines are more readily visible at 
brief exposures than oblique lines (Higgins & 
Stultz, 1948; Ogilvie & Taylor, 1958), and 
disappear less readily when viewed under 
Ganzfeld conditions (Craig & Lichtenstein, 
1953). In addition, judgments of whether 
two lines are parallel are more readily made 
about the vertical and horizontal (Andrews, 
1967; Rochlin, 1955). 

Results at present available from electro- 
physiological and masking experiments do, 
however, suggest why an asymmetrical S- 
shaped angular function is obtained for 
visual tilt aftereffects in central vision. Direct 
effects occur because some cells which signal 
the orientation of the test figure are excited 
during inspection and are consequently sup- 
pressed at the time the test figure is displayed, 
while the response of other cells which are 
tuned to the test orientation, but not the 
inspection orientation, is unaffected when the 
test figure is presented following inspection. 
Contour repulsion errors occur only when the 
inspection and test lines fall within the 
excitatory range (45 degrees or so) of single 
tilt analyzers. Assimilative errors (indirect 
effects) are produced when the figures differ 
by more than 45 degrees because cells sensitive 
to the tilt of the inspection figure are inhibited, 
relative to their spontaneous discharge levels, 
during inspection and hence are overexcited, 
rather than suppressed, when the test figure 
is later introduced. If this analysis of the 
mechanism of the indirect effect is correct the 
detection of a vertical line should be facilitated 
(relative to thresholds measured under control 
conditions using a homogeneous field rather 
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than a single line or a grating as the masking 
stimulus) rather than impaired following 
exposure to lines tilted 75 degrees from the 
vertical. The facilitative effect should be less 
pronounced than the masking effect obtained 
when the two contours differ only slightly 
in tilt; this is because indirect visual tilt 
afterefiects are smaller than direct visual tilt 
aftereffects (Gibson & Radner, 1937; Morant |. 
& Harris, 1965). Only three masking studies | 
have used a homogeneous field as a control 
condition. Sekuler (1965) found that exposure 
to a 60-degree grating impaired detection of a 
vertical line, and only one of the three subjects 
tested by Parlee (1969) showed any evidence 
of facilitation when the mask and target 
stimuli differed by 75 degrees. Both of these 
experimenters studied backward masking. 
With forward masking, Houlihan and Sekuler 
(1968) found that inspection of a 42-degree 
grating had the same influence on detection 
of a vertical line as exposure to a homogeneous 
field ; unfortunately, they did not obtain 
measures with more extreme tilts. Further 
attention should be given to the question of 
whether facilitation in addition to masking 
can occur with the detection of contours. 

Muir and Over (1970) found that indirect 
visual tilt aftereffects are not obtained when 
the inspection and test contours are seen in 
peripheral vision. In interpreting this result 
they proposed that individual cells in the 
periphery are excited over a wider range of 
orientations than foveal cells. Cells active 
during inspection of a line tilted 75 degrees 
from the vertical are therefore suppressed, 
rather than overexcited, when the vertical 
test figure is subsequently displayed. Wiesel 
(1960) has in fact found that ganglion cells in 
the area centralis have small concentrated 
centers and large, strongly antagonistic flanks, 
while peripheral cells have large centers and 
less influential surrounds. The masking meth- 
odology can be used to establish whether cells 
in peripheral vision are less finely tuned to 
contour orientation than foveal cells; the 
problem has not yet, however, been studied 
in this way. 

There are further ways in which the neural 
enhancement explanation of visual tilt after- 
effects may be tested, First, partial rather 
than complete interocular transfer of visual 
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tilt aftereffect should be obtained. This is 
because some orientation-sensitive cells are 
excited by stimulation of only one eye, and 
others can be driven either monocularly 
or binocularly (e.g, Hubel & Wiesel, 1968). 
In addition, contour masking is more effective 
with monopic than with dichopic presentation 
of the masking and target figures (Gilinsky & 
Doherty, 1969). Interocular transfer of visual 
tilt aftereffect has not yet been studied. 
Second, contour masking occurs with back- 
ward as well as forward stimulus presentation. 
Visual tilt aftereffects might therefore be found 
with the test line shown before the inspection 
figure providing the interstimulus interval is 
minimal and the test line is presented just 
above the masking threshold. Figural after- 
effects are obtained with stimuli displayed 
in this sequence (Obanai, 1957; Taylor, 1962), 
but whether visual tilt aftereffects also occur 
has not yet been investigated. They might be 
expected if neural inhibition is the mechanism 
underlying backward as well as forward 
masking (Weisstein, 1968). 


Movement Aftereffects 


The present section examines the proposal 
that stationary contours seen after exposure 
to a moving figure appear to move in the direc- 
tion opposite to the inspection stimulus be- 
cause visual cells that are maximally responsive 
to the direction in which the inspection figure 
was moving are suppressed after inspection 
relative to cells that are maximally sensitive 
to movement in the opposite direction (Barlow 


& Hill, 1963). Other explanations of movement 
aftereffect have been discussed by Holland 
(1965) and Wohlgemuth (1911) and are not 
considered here. In the present section, ex- 
periments which used tracking methods to 
measure movement aftereffect are cited in 
preference to studies which have relied on the 
phenomenal reports of subjects. This is be- 
cause tracking methods measure the initial 
magnitude of movement aftereffect and its 
- than merely the duration 


rate of decay rather : ioi 
of the aftereffect; in addition, their explicit 
permitted movement 


procedural features have 
aftereffect to be measured using nonhuman 
Subjects (Scott & Powell, 1963). 


Cells which are selectively responsive to the 
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direction in which contours move across the 
visual field have been measured with many 
species (see Creutzfeldt & Sakmann, 1969). 
In one study Barlow and Hill (1963) recorded 
the discharge of movement-sensitive ganglion 
cells in the rabbit retina under stimulus condi- 
tions similar to those that induce movement 
aftereffects. Neural activity was measured 
before, during, and after the eye was stimu- 
lated by a moving figure. Movement in the 
nonpreferred (null) direction did not change 
the cell’s discharge frequency from the spon- 
taneous activity level found before stimulation. 
However, movement in the opposite (pre- 
ferred) direction resulted initially in a large 
increase in neural response before a steady rate, 
dependent on the velocity of the stimulus, 
was attained. Termination of movement 
immediately suppressed the cell, and recovery ` 
to the spontaneous activity level was not 
complete (following 57-second exposure to à 
figure moving at 15 degrees per second) until 
45 seconds later. This recovery time is com- 
parable with decay values found for movement 
aftereffect (e.g, Taylor, 1963a). Barlow and 
Hill (1963) proposed that movement after- 
effect occur because the pattern of neural 
activity generated when a stationary figure 
is displayed after exposure to a moving figure 
is similar to that normally produced by slight, 
but maintained, movement in the direction 
which was null during inspection. This im- 
balance occurs as the result of suppression 
of cells which were active during inspection. 
‘The assumptions underlying this analysis 
have been outlined in some detail by Sekuler 
and Pantle (1967). 

Masking studies indicate that the human 
visual system contains cells with directional 
sensitive properties. Prolonged exposure to 


contours moving in one direction subsequently 


increases the threshold for detection of con- 
tours moving in the same direction, but has 
relatively little influence on the detection 
of movement in the opposite direction (Pantle 
& Sekuler, 1969; Sekuler & Ganz, 1963). 
To test Barlow and Hill's explanation of 
movement aftereflect it is necessary, however, 
to establish that stimulus variables which 
affect. the of movement-sensitive 
cells have co over move- 
aftereffect. 


response 
rresponding influence 


Few detailed comparisons 


ment 


bain d 
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of this type are possible at the present time. 
Several examples can be considered. 

There is a complex relationship between 
the magnitude of movement aítereffect and 
the velocity of the inspection stimulus; in- 
creasing velocity generates larger aftereffects 
up to a given stimulus value, and thereafter 
smaller movement aftereffects are found 
(Scott & Noland, 1965). This may occur be- 
cause directional sensitive cells each respond 
maximally when the stimulus is moving at a 
particular velocity, and more cells are excited 
when a midvalue of the stimulus range is 
being signaled than when the figure is moving 
extremely slowly or rapidly (see Pantle & 
Sekuler, 1968; Pettigrew, Nikara, & Bishop, 
1968). Not enough is known about tuning 
characteristics within the population of cells 
responsive to movement to establish whether 
this explanation is correct. A similar problem 
arises in determining why inspection of 
centripetal motion produces larger movement 
aftereffects than inspection of centrifugal 
motion (e.g, Bakan & Mizusawa, 1963). 
This may result from differences in the number 
of cells sensitive to each direction of motion 
(Oyster, 1968; Tauber & Atkin, 1968), or 
because of learned adaptation to asymmetries 
in the normal visual environment (Scott, 
Lavender, McWhirt, & Powell, 1966). 

Detailed attention should be given to the 
influence of exposure interval on the rate at 
which recovery from neural suppression occurs 
in view of the recent finding that movement 
aftereffects generated by 15-minute inspection 
persist for 20 hours or longer (Masland, 1969). 
In addition, studies which measure post- 
excitatory suppression should use homogeneous 
as well as patterned test fields to determine the 
neural correlates of afterimages found follow- 
ing inspection of moving figures (Grindley & 
Wilkinson, 1953) as well as the reason for the 
"storage" of movement aftereffect which 
occurs when a dark interval is interpolated 
between exposure to the inspection figure and 
presentation of the test stimulus (Honig, 
1967 ;? Spigel, 1962). 

Evidence that movement aftereffects trans- 


? Honig, W. K. Studies of the "storage" of the after- 
effect of seen movement. Paper presented at the meet- 
ing of the Psychonomic Society, Chicago, October 
1967. 
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fer from the eye stimulated during inspection 
to the unstimulated eye has in some cases 
(e.g., Freud, 1964) been taken to support the 
position that movement aftereffects are de- 
pendent on cortical rather than retinal proces- 
ses. The occurrence of interocular transfer 
does not by itself, however, indicate where the 
events responsible for movement. aftereffect 
are located (Day, 1958); this is because half 
of the visual field of each eye projects onto 
each hemisphere so that many cortical cells 
are responsive to stimulation of either eve 
€g. Hubel & Wiesel, 1968). Restriction of 
movement aftereffects to hemiretinas which 
project to the same hemisphere (Freud, 1964) 
would thus occur no matter where in the visual 
system the events responsible for movement 
aftereffect are located. Their locus can be 
partly established, however, by measuring 
transfer of movement aftereffect with the 
influence of the eye stimulated during inspec- 
tion eliminated at the time judgments are 
made with the other eve. This can be done by 
applying pressure on the eyeball of sufficient 
intensity to eliminate the response of retinal 
ganglion cells. Interocular transfer of move- 
ment aftereffect occurs with pressure blinding 
of the previously stimulated eye (Barlow & 
Brindlev, 1963; Scott & Wood, 1966); there- 
fore, movement aftereffects found with human 
subjects are not dependent on processes 
restricted to the retinal level. Interocular 
transfer of movement aftereffect is, however, 
partial rather than complete (Scott & Wood, 
1966); this is expected from the neural en- 
hancement explanation in that some cortical 
cells excited by movement can be driven by 
stimulation of either eye, but others are 
responsive to stimulation of one eye alone 
(Pettigrew et al., 1968). It would be expected 
that masking effects found with movement 
perception (e.g., Sekuler & Ganz, 1963) also 
transfer interocularly with pressure blinding. 
It is possible that movement aftereffects 
also occur in other senses. Thalman (1922) 
found that a stationary figure on the skin 
appeared to be moving if textured material 
had been previously moved in one direction 
across the same region of the skin. Wohlgemuth 
(1911), however, had earlier found no evidence 
of tactile movement aftereffects. [tis surprising 
there have been no other studies of movement 
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afterefiects on the skin in view of the informa- 
tion that might result about the cause of 
aftereffects in vision. Nor has any attention 
been given to the possibility that auditory 
movement aftereffects occur. If similar move- 
ment aftereffects are found in several modali- 
ties, it would be worth investigating whether 
cells in sensory systems other than vision 
respond selectively to the direction in which 
a stimulus is moved and show suppression 
when the movement ceases. 


TIVE AFTEREFFECT 


ADAPTATION AND NE 


In outlining normalization theory Gibson 
stated : 


[T]he present discussion of the generalized negative 
aftereffect does not propose any physiological theory 
whatever to explain it. Our contention is only that the 
effect is a corollary of adaptation and consists of a 
serial shift in stimulus-quality correspondence. If this 
description is correct, physiological theories, whether 
in terms of opposed excitatory processes Or of fatigue 
or reduced sensitivity of the receptor, or some other 
mechanism, must account for the whole description, 
including the adaptation, and cannot assume that the 
negative aftereffect is nothing more than a simple proc- 
ess of after-discharge [Gibson, 1937b, p. 241]. 


It was shown in an earlier section that refer- 
ence need not be made to perceptual changes 
which occur during inspection when an ex 
planation is being offered of negative after- 
effect. Inquiry into the nature of the adapta- 
tion process and its relation to negative 
aftereffect is of interest nevertheless because 
neutralization in experience often occurs 
while the subject is exposed to the inspection 
stimulus in an aftereffect experiment. In the 
present section it is suggested that what 
Gibson has referred to as adaptation has 
the same neural correlates as negative after- 
effect, and can be differentiated from negative 
aftereffect solely by reference to experimental 
procedures. 

It is not generally recognized that an abrupt 
change in stimulus value occurs at the start, 
as well as the end, of the inspection period 
in an aftereffect experiment. In the former 
case, the change typically is from a less in- 
intense stimulus value, and 
in the latter case from the intense value to 
one which is less intense. The shift from 
Preinspection to inspection conditions results 


tense to a more 
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in enhanced neural activity similar to that 
produced by the shift from inspection to 
postinspection conditions (e.g, Barlow & 
Hill, 1963; Hensel & Boman, 1960; Ratliff 
et al., 1963). Equivalent perceptual changes 
should therefore occur in the two cases. To 
label one as adaptation and the other as after- 
efiect is to make a distinction based on experi- 
mental procedures rather than cause. The 
proposal being made is that the perceptual 
changes which Gibson has referred to as 
adaptation reflect nothing more than recovery 
from the neural enhancement produced by 
transition from preinspection to inspection 
conditions. The rate and amount of adaptation 
would thus be dependent on the degree and 
suddenness of the change in stimulation, but 
the level finally reached would be independent 
of these variables. In addition, the shift from 
an extreme preinspection stimulus value to a 
less extreme inspection value should lead to 
intensification of experience during inspection ; 
neutralization should occur only when the 
shift is from a less extreme to an extreme value. 
The latter expectations are consistent with 
measures of the perception of brightness ob- 
tained following exposure to different pre- 
inspection values (Jameson & Hurvich, 1949). 

The neural enhancement position proposes 
that negative aftereffects result whenever the 
shift from inspection to test conditions pro- 
duces exaggerated modification in neural 
response. Adaptation need not occur during 
inspection; the inspection figure need be 
displayed only for an interval sufficient to 
affect the response generated by the test figure. 
The nature of the aftereffect will depend on 
the type of modification that occurs in neural 
activity in the shift from inspection to post- 
inspection conditions. In these terms, when 
appropriate stimulus values are chosen, a test 
figure can evoke an intensified experience 
following exposure to a figure which neutralized 
during inspection; that is, negative aftereffect 
can occur in the direction opposite to normal- 
ization. Consider, as an example, the case 
where the subject inspects a moving figure 
before being presented with a test figure mov- 
ing in the same direction but at a faster speed. 
]f the response rate of cells sensitive to move- 
is monotonically related to stimulus 
Finklestein & Grüsser, 1965), cells 


ment 
velocity t 
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active during inspection will be overexcited, 
rather than suppressed, by the introduction 
of the test figure. With this stimulus arrange- 
ment Carlson (1962) and Rapoport (1964) 
found that the apparent speed of the test figure 
was either increased or unaffected by inspec- 
tion. From Gibson's normalization theory it is 
expected that the apparent Speed of the test 
figure would be reduced. 


CONCLUDING REMARKS 


Gibson (1959a) has maintained that negative 
aftereffects are basically different from figural 
aftereffects. It is proposed in the present 
paper that neural processes which enhance 
stimulus contrast underlie the two sets of 
aftereffects, but that negative aftereffects 
occur in cases where stimulus properties are 
signaled in sensory systems by nontopographic 
coding principles, and figural aftereffects are 
found when topographic coding is involved. 
In insisting that negative aftereffects occur 
only within continua which are oppositional 
in nature, Gibson, even though he shunned 
Physiologizing, showed considerable insight 
into the fact that cells which signal informa- 
tion about contour orientation. (visual and 
kinesthetic), movement, wavelength, luminous 
intensity, and skin temperature are organized 
into opponent-process systems. What he has 
referred to as the neutral point or norm of a 
continuum almost certainly corresponds to 
the stimulus value at which maximum overlap 
occurs in the excitatory ranges of units with 
opposed peak sensitivities. This is in essence 
an equilibrium condition, and discrimination 
is most precise about this point because a 
slight change in stimulus value disrupts the 
state of balance by increasing the response 
of some cells and decreasing the discharge 
of their neighbors (Mounteastle et al., 1963), 
Whether the neutral point is also the most 
frequent stimulus value in ordinary experience, 
as Gibson (1937b) has supposed, has not been 
established. 

Even though in normalization theory Gibson 
correctly recognized that negative aftereffects 
occur only within modalities with oppositional 
properties, the explanation he offered of 
negative aftereffects is clear] v incorrect. N ega- 
tive aftereffects are not produced as by- 
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products of an adaptation process. The present 
Paper considered the claim that negative 
aftereffects result from sudden variation in 
Stimulus value because the difference in the 
response of opposed neural units is exaggerated 
for a period of time after the shift. If this 
proposal is correct, it is of value to refer to 
dimensional relationships between the in- 
Spection and test stimuli onlv to the extent 
that these relationships reflect. the way in 
Which neural units interact. In this connection 
it was shown earlier that. the properties of the 
tilt continuum cannot be specified solely by 
reference to stimulus values; tilt of line is 
Processed differently within the visual and 
kinesthetic systems. An adequate theory of 
negative aftereffects therefore must not only 
describe an error-inducing mechanism but 
must also consider the manner in which sensory 
systems scale stimulus dimensions. 

It is premature at the moment to draw 
firm conclusions about the merits of the 
neural enhancement position. Explanations of 
particular negative aftereffects developed from 
the position cannot be properly tested until 
detailed data are available both on the dis- 
tribution and tuning of different property 
analyzers, and on the way these analyzers 
respond to changes in stimulus value. This 
type of information can be obtained by further 
microelectrode studies of infrahuman sensory 
systems, or by use of the contour masking 
methodology with human subjects. Weisstein 
(1969) has recently exhorted psychologists 
to make sophisticated use of masking methods, 
and has discussed the way particular paradigms 
can be used to obtain information about 
different aspects of feature analysis in the 
human visual system. These paradigms should 
receive detailed attention in the near future 
both because of the possibility of establishing 
parallels between data obtained by this 
method and electrophysiological measurement 
and because of the contribution such data 
can make to the explanation of negative 
aftereffects. i 
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» TALK, SILENCE, AND ANXIETY 


DAVID C. MURRAY : 
Veterans Administration Hospital, New York 
Articles relating anxiety and verbal productivity are reviewed. Studies are 


grouped into those in which anxiety js manipulated by varying environmental 
stress (situational anxiety), measured by choosing subjects differing in vulner- 


and then falling as stress increases, app 
between anxiety and verbal productivity 


This review is concerned with the relation 
to anxiety of one aspect of paralinguistic 
(nonsemantic or nonmeaning) speech, verbal 
productivity, which, as used here, is composed 
of three major divisions: quantity, rate, and 
silence, or how much a person talks, how fast, 
and how much he is silent. Results of the 
many studies examining the relationship of 
anxiety to one or more of the verbal produc- 
tivity variables have lacked consistency. Of 
44 studies relating anxiety and speech quan- 
tity, 13 have shown significant 2 positive, 16 
significant negative, and 15 nonsignificant re- 
lationships. Of 25 studies dealing with the 
relation of silence to anxiety, 8 reported sig- 
nificant positive, 6 significant negative, and 11 
nonsignificant relationships. For 19 studies of 
verbal rate and anxiety the comparable 
figures are 7, 4, and 8. 

There seem to be three possible conclu- 
sions: (a) Anxiety is not related in any mean- 
ingful way to verbal productivity, and sig- 
nificant results in either direction are due to 


1 Various members of the psychology department 
of Syracuse University have read, commented on, 
or made suggestions used in this review. The author 
is particularly grateful to Sanford Dean, Ronald 
Kurz, Donald Meyer, and J. Conrad Schwarz. They 
are of course in no way responsible for any errors 
of fact or deficiencies in interpretation. 

Requests for reprints should be sent to David C. 
Murray, Veterans Administration, 803 South Salina 
Street, Syracuse, New York 13202, 

?In this review the term "significant" indicates 
the .05 level of probability and “very significant" 
the .01 level. 


other factors. With so many significant re- 
sults this seems an unlikely hypothesis. How- 
ever, the reviewer has accumulated a list of 
roughly 175 dispositional and situational vari- 
ables which have been studied in relation to 
verbal productivity. Many of these have been 
found in one or more studies to relate sig- 
nificantly to verbal productivity. Uncontrolled 
variation in one or more of these, or other 
factors, may be affecting results relating anx- 
lety to verbal productivity. (5) Anxiety is not 
a unitary concept, but is made up of several 
factors, one or more of which are positively, 
and one or more of which are negatively, and 
Some of which may be unrelated to verbal 
productivity. (c) Anxiety and verbal produc- 
tivity have a curvilinear rather than a recti- 
linear relationship. Increasing the level of anx- 
iety from mild to moderate would raise pro- 
ductivity, but further increases to very high 
levels of anxiety would lower productivity, 
This is referred to as the U-curve relationship 
or hypothesis. 

The most frequently used measures of 
verbal quantity are time talking, number of 
words, number of clause units, number of in- 
teractions, and duration (time talking divided 
by number of interactions). There is consider- 
able evidence that the first three of these are 
highly correlated; often at levels above .9 
(for example, Matarazzo, Holman, & Wiens, 
1967). There is equally good evidence that 
the last two, Particularly in a fixed time 


monologue or dialogue, have a high negative 
correlation with each other. Number of 
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clause units has a significant negative correla- 
tion with number of interactions, and only a 
moderate positive correlation with duration 
(for example, Hare, Waxler, Saslow, & 
Matarazzo, 1960). 

, There appear to be two basic types of 
silence measures, one based on pauses or hesi- 
tations during speech and the other on the 
length of time before a person starts talking 
(latency or reaction time). Pope and Siegman 
(1964, 1966, 1968) reported positive correla- 
tions between these two ranging from a very 
significant .58 down to a nonsignificant .30. 
The various forms of pause measures inter- 
correlate variously, pauses over and under 1 
second appearing to have little relationship 
(Levin & Silverman, 1965) while number of 
pauses and time pausing have a reported cor- 
relation of .66 (Levin, Silverman, & Ford, 
1967). 

‘The two rate measures, speech and articula- 
tion, do not appear related (Goldman-Eisler, 
1956), and so few studies have used articula- 
tion rate that it is not considered. 

The longer a person talks the more chances 
he has to pause, but assuming that time is 
equated in verbal samples, verbal quantity 
and pauses tend to be negatively correlated, 
sometimes at statistically significant levels 
ranging as high as —.68 (Goldman-Eisler, 
1961), while verbal quantity and reaction 
time also tend to be negatively related, though 
significantly s0 in only one (Matarazzo, 


Saslow, & Hare, 1958) of five studies. The 
longer a person talks the slower his speech 
studies none reported a 


rate, though of four 
significant relationship (for example, Kelly & 
hen either number 


Steer, 1949). However, W. r 
of words or percentage of time talking are 
used as measures of speech quantity, reported 
correlations are almost always positive and 
often significant, ranging a5 high as .48 (Kelly 


& Steer, 1949). Speech rate and the silence 


measures are negatively correlated in all cases, 
tudies reporting 


significantly so in the four s 
correlations with pauses ranging as high as 
—.94 (Goldman-Eisler, 1956) and nonsignifi- 
cantly so for the three studies reporting COY- 
relations with reaction time (Pope & Seigman, 
1964, 1966, 1968). 

In general, it appears that verbal produc- 
tivity tends to be measured positively by 
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verbal quantity and speech rate, negatively 
by the silence measures. j 

A number of studies are omitted from this 
review because other variables such as condi- 
tioning, the effects of a speech course, strength 
of opinions, intelligence, and differences in 
pressure to speak appear to be confounded 
with anxiety, making interpretation hazard- 
ous. Also omitted are several studies with 
only one or two subjects and a few studies 
that do not fit into the anxiety categories 
used. 

To consider the evidence so far, available 
studies have been classified into three group- 
ings according to the operation used to mea- 
sure or arouse anxiety and designated situa- 
tional, dispositional, and concurrent anxiety. 


SITUATIONAL ANXIETY 


A number of studies, summarized in 
Table 1, sought to manipulate anxiety by 
varying the presumed stressfulness of the situ- 
ation. All but 6 of these 27 studies used col- 
lege students, including student nurses, as 
subjects. 


Negative Audience Response 


In four studies, subjects spoke before recep- 
tive approving or unreceptive disapproving 
audiences. Cervin (1956) placed each of 64 
undergraduates with two confederates of the 
experimenter who in one condition consist- 
ently agreed with and approved the subject’s 
statements in making up a group story to a 
Thematic Apperception Test (TAT) card, 
while in the other they consistently disagreed 
with and disapproved of the subject’s strongly 
held opinions on a topic. Subjects receiving 
disapproval spoke a very significantly smaller 
proportion of the time (.25 to 40) and with 
a very significantly longer latency (over 75 
seconds as compared to less than 10 seconds). 

Male undergraduates gave short speeches 
after a confederate of experimenters (Miller, 
Zavos, Vlandis, & Rosenbaum, 1961) either 
received approval while the subject did not, 
with neither the confederate nor the subject 
receiving approval, or with both receiving 
approval. Subjects experiencing less approval 


than the confederate spoke slightly less. In a 
(Miller, 


more complex version of this design 
1964), with 108 male undergraduates, those 


TABLE 1 
Direction or RELATIONSHIP BETWEEN SITUATIONAL ANXIETY 
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Y AND THE 
VERBAL Propuctivity MEASURES 4 


Studies Stress : Verbal productivity measures 
iudies situations Subjects 


Quantity Silencee Rate 


Cervin (1956) Neg. Res, 64 Coll. — 
39 Coll. — 
108 Coll. —*U 
108 Coll. —* 
48 Child —** 
80 Coll. 


Miller et al. (1961) SET, 


Miller (1964) 
Vlandis (1964) 

Levin et al. (1960) 

Geer (1966) 

Sauer & Marcuse (1957) 
Kanfer (1959) 

Kanfer (1960) _ 
Seigman & Pope, (1965a) 
Siegman & Pope (1966) 
Pope & Siegman (1967) 
Drennen & Wiggins (1964) 
Allen et al. (1965) 

Reece & Whitman (1962) 
Reece (1964) 

Pope & Siegman (1968) 
McCoy (1965) 

Suedfeld et al. (1964) 
Suedfeld et al. (1965) 
Oyamada (1966) 

Walters & Henning (1962) 
Zuckerman et al. (1962) St. Depr. 
Fenz & Epstein (1962) Parachute 
Kanfer (19582) Sh. Thr. 
Kanfer (1958b) Sh. Thr. 
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ubl. Sp. = public spea 


are three or more leyels of 


hip with anxiet 


ditection ety. Where there 


Test test i 183 Ov. Rec. vert recording: Anx " 

over famed ` yesh ae threat, nstructions; St. Depr, “= stimulus deprivation, Dy c slou pa 

» Coll. = college students; HS = high school students; SN dg » ` ute dety 
normal adults. entsi SN. = student nurses; NP = neuropsychiatr} i 

e P = pause; L = latency measure of silence, = "wehiatrie patients; Norm = 

4 Significant for high but not low MAS scorers, 

© Not significant for latency. 

! Not significant for pause. 

* 5 «.05. 


"*5 <.01. 


subjects receiving disapproval after the con- 
federate received approval (increased Stress) ment) ha 
spoke significantly less than those receiving — while tho. 
the same treatment as the preceding speaker 
(approval or disapproval), while those receiv- 
ing approval after the first speaker received 
disapproval (decreased stress) were in the 
middle. In a third variation (Vlandis, 1964) 
108 undergraduates gave short Speeches, for 
the first third of which the experimenter said 
nothing, but F the second and last thirds of 
which he used varying combinations o Speaki, 

approval followed by disapproval, TE peaking Before Others 
by approval, approval by approval, approval 
followed by no comment, etc, Subjects going by th 


Stoup was 
as silent as the contro] Over seven times 


n thr i Vi 

ees SS ari y 
Int tud €S stre ed b manip 
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audience. In that of Levin, Baldwin, Gallwey, 
and Paivio (1960), 48 ten- to twelve-year-old 
children told a story to a sentence stem first 
to the experimenter alone and then to an 
audience of seven adults. Almost all told very 
| Significantly shorter stories to the larger, sup- 
posedly more stressful audience. (Mean sec- 
onds speaking were 186 to 278; computed by 
the reviewer from data in Levin et al., 1960.) 
| From the self-ratings of 80 female under- 
^ graduates, Geer (1966) selected 20 high and 
20 low in fear of speaking before a group. 
Subjects gave 1-minute speeches on à task 
they had just completed, believing a group 
was observing them. For the first 30 seconds 
speech rate was very significantly slower for 
high-fear subjects, and mean silence quotient 
, Significantly higher (8.8 versus 4.4 seconds). 
* Since speeches were monologues, and of even 
length with only the first 30 seconds ana- 
lyzed, speech rate is also a measure of speech 
quantity, so results are consistent with those 
of Levin et al. (1960). 
In Sauer and Marcuse (1957) the manipu- 
lation was between overt and covert recording 
of TAT stories, the overt recording presum- 
ably increasing the potential audience in 
subjects’ minds. Word count was higher for 
- those overtly than for those covertly recorded, 
significantly so for subjects with high Mani- 
fest Anxiety Scale (MAS) scores (380 versus 
336, 402 versus 380 for those with low MAS 
scores). Latency was very significantly shorter 
under overt recording conditions (39.8 versus 
53.6 seconds for those with high and 46.7 
versus 59.8 for those with low MAS scores) 
and word rate significantly faster. Results are 
the opposite of those in the other studies of 
situational anxiety covered so far. Only a 
« quarter of subjects reported increased diffi- 

culty with overt recording so stress manipu- 

lation appears weaker than with other studies 

of situational anxiety. Assuming covert record- 
ing provided mild and overt recording mod- 
erate stress, results would be consistent with 
an inverted-U relationship between anxiety 
and verbal quantity. 


Stressful Topics 
A number of studies have attempted to 
\ manipulate stress through the topic on which 
subjects are instructed to speak, some topics 
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being assumed more stressful than others, and 
some topics being rated from content with 
regard to subjects’ adjustment. Topics on 
which subjects’ content indicated poorer ad- 
justment were presumably more stressful for 
them. Kanfer (1959) asked 29 college stu- 
dents of both sexes to give 3-minute mono- 
logues on each of five topics. No group differ- 
ences in speech rate were found, but there was 
a significant tendency for subjects to talk 
faster on those topics where content suggested 
poorer adjustment. In a similar design with 
38 adult, married, female neuropsychiatric 
patients, Kanfer (1960) found verbal rate to 
differ very significantly among four topics, 
with the highest verbal rate being found when 
subjects spoke about their illness, assumed to 
be the most stressful topic. There was a non- 
significant tendency for those topics in which 
content was rated most stressful to have the 
highest verbal rate. 

In these studies, Kanfer dealt with a mono- 
logue-type situation and equal time periods, 
so rate and quantity are the same. In the 
second study, Kanfer (1960) used neuropsy- 
chiatric patients for subjects, and their desire 
to talk to a professional about their problems 
could have more than counterbalanced the 
usual negative effect of stress on verbal quan- 
tity. Another possibility is that when topic is 
used to manipulate anxiety, any effect on 
verbal quantity may be minimal at first, since 
most people probably have some nonanxious 
stereotyped comments they can make about 
problem areas, giving a listing of symptoms 
or the history of the trouble, Allowing only 3 
minutes for speeches may have altered the 
relationship. 

Kanfer’s second study raises yet a third 
possibility, since he collected eyeblink data on 
all subjects. Eyeblink appears to be a particu- 
larly suitable index of generalized muscular 
tension (Meyer, 1953; Meyer, Bahrick, & 
thus often indicative of anxi- 
link figures suggest that of 
the four topics used the least anxiety-provok- 
ing was sex, with family and home inter- 
mediate, and illness the most stressful. (Sex 
involved these married women in talking posi- 
tively about their husbands, and so could well 
have been minimally stressful.) Verbal = 
figures showed illness with the highest an 
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sex with the lowest verbal rates, indicating a 
positive relationship between rate (or quan- 
tity) and stress. The eyeblink data also 
showed steadily decreasing eyeblink rate in 
each of four succeeding 30-second time pe- 
riods, suggesting decreasing stress with each 
period. To fit the U curve the four figures for 
each topic must show one of three configura- 
tions. Assuming high stress to begin with, the 
four figures can show progressively increasing 
talk as stress is reduced, Assuming somewhat 
lower stress to start, the four figures can 
show first increasing and then decreasing talk. 
Assuming moderate stress to begin with, the 
four figures can show progressively decreasing 
talk as stress becomes milder. If the eyeblink 
data can be generalized to the four 30-second 
time periods in Kanfer's first Study as well, 
the data from five of the topics exactly fit 
the U curve, data from three come very close 
to fitting it, and data from only one do not 
fit it well. 

In four studies Siegman and Pope explored 
the effects of topic on verbal productivity. 
One study is not considered here (Pope & 
Siegman, 1965) because there was differential 
pressure to speak on the two topics, as noted 
in a later paper (Pope & Siegman, 1967). In 
all studies, female student nurses were inter- 
viewed on a low (schooling) and high (family 
relations) stress topic. Self-rating, bipolar 
scales indicated the family relations topic to 
be only moderately stressful, but more stress- 
ful than the schooling topic, in the first two 
studies discussed. 

Siegman and Pope (1965a) found no sig- 
nificant relationship between topical focus and 
either reaction time or silence quotient for 50 
subjects. With 16 subjects, Siegman and Pope 
(1966) found verbal productivity lower on 
the stress topic and reaction time Shorter, 
neither at acceptable levels of significance, 
but silence quotient was significantly Shorter, 
In the last of these studies (Pope & Siegman, 
1967), with 32 subjects, the family topic re- 
sulted in very significantly fewer words (58 
versus 89) per response, but differences in 
reaction time were nonsignificant (direction 
not given). 
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Interviewer Climate 


It seems reasonable to consider studies 
manipulating interviewer climate, warm or 
cold attitudes towards the interviewee, aS. 
related to anxiety, with cold interviews pro- | 
viding more stress. Pope and Siegman (1968) | 
noted that cold interviews increased speech 
disturbances, suggesting that they were arous- 
ing anxiety. 

Drennen and Wiggins (1964) studied 10 
chronic schizophrenics seen 
group-therapy sessions, conducted alternately 


NA 
by a pair of congenial, supportive therapists 


and a pair of hypercritical, nonsupportivé 


therapists. The cold approach resulted in con- | 


siderably and significantly reduced interac- 
tions in the first. Session, compared to the 
warm approach 
week, patients may have adapted to the cold 
therapists! attitude and were no longer made 
anxious by it, for patients now talked as 
much with either pair of therapists, 

In a rather weak manipulation of the 


weekly for 10 | 


(74 to 136). By the fifth 


Warm-cold set (Allen, Wiens, Weitman, & 
Saslow, 1965), 40 male civil-service applicants 
were told the interview would be very warm 
or rather cold, though actual behavior re- 
mained unchanged, The Cold set resulted in 
significantly (over one-third) longer latencies, 
and slightly longer durations of interviewees’ 
interactions, 

Reece and Whitman (1962) assigned 69 
college students randomly to groups receiving 
either warm or cold experimenter behavior as 
conveyed nonverbally by the experimenter 
showing interest and attention or disinterest 
and nonattention. Subjects were told to say 
disconnected words for 15 minutes. Under 
cold conditions they said very significantly 
fewer words (about 298 to 328). In a design 


with a similar manipulation, 


) had 32 student 
with one of the 
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interviewers said to be and trained to behave 
warmly and the other said and trained to be 
cold. Subjects in the cold condition gave sig- 
o fewer words per response (59 to 

The results on warm-cold interviews are 
almost consistent. Of five studies, four re- 
ported a significant negative relationship be- 
tween verbal quantity and situational anxi- 
ety, and one reported a very small, nonsig- 
nificant positive relationship. Both of the two 
studies of silence and situational anxiety 
showed positive relationships, one significantly 
so, 


Test Instructions 

McCoy (1965) appears to have been ma- 
nipulating stress when she told 28 fourth- 
grade boys that they were taking a drawing 
test (stress) and 28 that they were taking 
part in a game (nonstress) , and then had 
each group do two drawings. Test instructions 
very significantly reduced the subjects’ free 
verbalizations as compared to game instruc- 
tions, more than halving them. 


Stimulus Deprivation 

Several studies varied accessibility to stim- 
uli, with stress presumably increasing from 
the normal situation through social isolation 
to stimulus deprivation. These studies are 
presented very briefly, since it is possible that 
stimulus deprivation involves physiological 
consequences unrelated to and confounding 
the effects of stress variations. In three of 
these experiments (Oyamada, 1966; Sued- 
feld, Grissom, & Vernon, 1964; Suedfeld, 
Vernon, Stubbs, & Karlins, 1965) subjects 
endured 24 hours of severe stimulus depriva- 
tion, a comparable period of social isolation, 
or the control condition. In all studies verbal 
quantity was higher in the social isolation 
(moderately stressful) condition, and lower 
in the stimulus deprivation (highly stressful) 
condition, significantly so in the two Sued- 
feld experiments. In Suedfeld et al. (1965) 
four subjects in the isolation group spent the 
entire time, inadvertently, in temperatures 
over 90° Fahrenheit, an extremely stress- 
ful situation, and their speech decrement was 
300% greater than for the stimulus depriva- 
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tion condition. Oyamada (1966) also reported 
speech rate to be lower in the deprivation than 
in the control group. 

In somewhat similar studies Walters and 
Henning (1962) found high school boys’ 
speech quantity did not change significantly 
after 3 hours, but was significantly reduced 
after 6 hours of isolation, while Zuckerman, 
Albright, Marks, and Miller (1962) reported 
a significant reduction in verbal quantity for 
student nurses confined for 6 hours in an iron 
lung with rather complete stimulus depriva- 
tion, whereas a similarly confined group with 
little stimulus deprivation showed _ little 
change. 

Results of the stimulus deprivation studies 
then are consistent with most other situa- 
tional anxiety studies in revealing a drop in 
verbal quantity, usually significant, with se- 
vere stress. In addition, there is the strong 
indication in three of the studies that mod- 
erate stress raises verbal quantity. 


Parachute Jumping 


On the reasonable assumption that para- 
chutists would be stressed by looking at pic- 
tures about parachuting shortly before making 
a jump, Fenz and Epstein (1962) tested 32 
collegians—16 control and 16 novice para- 
chutists—on verbal responses to pictures un- 
related to, slightly, and strongly related to 
parachuting, in two sessions 2 weeks apart. 
The parachutists were tested 2 weeks away 
from (control day) and then on the day of 
(and before) a jump. For parachutists on the 
control day there was à slight progressive 
decrease in reaction time with increasing 
stimulus relevance. On the jump day, para- 
chutists reaction time was about the same 
as for the control day on both the neutral 
and the low relevance stimuli, but sharply 
longer on the high relevance stimulus. Effect 
across stimulus dimensions was Very signifi- 
cant. Parachutists’ reaction time on the day 
of the jump to the high relevance stimulus was 
far longer than the control groups. Pause time 
" nonsignificantly for the parachutists 
he jump in the same manner, 
100 seconds being 13.1, 8.6, 
relevant, and 


varied 
on the day oft 


mean pauses per 
and 16.7 for the neutral, low- 


high-relevant pictures, with minimal changes 
for the control group and for the parachutists 
on the control day. Inverted-V curves were 
found for parachutists on the day of jump for 
response time, number of words, and rate of 
responding, but differences were not significant 
for stimulus dimension or for interaction of 
stimulus dimension with experimental condi- 
tion. That stress was indeed being manipu- 
lated in this study was suggested by galvanic 
skin response readings of subjects as they 
looked at the pictures, with a steady increase 
in galvanic skin response reactions from neu- 
tral, to low, to high relevance cards for para- 
chutists on the day of the jump. 


Electric Shock 


Two studies by Kanfer used electric shock 
preceded by a tone for subjects asked to say 
separate words until told to stop. In the first 
(Kanfer, 1958a) there were 12 acquisition 
followed by 5 extinction trials at intervals of 
1 to 3 minutes for the 17 undergraduate ex- 
perimental subjects. Of three control groups 
the first received shock only, the second tone 
only, and the last neither. For the four 
groups, total word count during the 52 min- 
utes of the experiment did not differ signifi- 
cantly, though there was a tendency for 
shock, either preceded by tone or alone, to 
reduce the word count while tone alone re- 
duced it even more. For verbal rate there 
were changes in the experimental group due to 
tone-shock pairings not found in the three 
control groups, with a progressive increase of 
verbal rate after the tone, an initial further 
increase followed by a decline just prior to 
the shock, and an additional decline post- 
shock. Whether these changes in the experi- 
mental group are significant is not indicated, 
Tn a similar design (Kanfer, 1958b), with 
conditioning spread over a 9-day period for 
12 male undergraduates, similar changes (sig- 
nificant) were shown, with word rate increas- 
ing posttone and at first decreasing pre- and 
postshock, but in later trials, unlike the 
earlier study, word-rate increase was main- 
tained during the preshock period with the 
same postshock decline continuing. 

Heart rate data, which may well be a 
measure of anxiety, was obtained in the first 


study (Kanfer, 19582), and in general where 
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these data suggest high anxiety, verbal rate 
decreases. For the most part heart rate in- 
creased steadily from pretone to postshock, 
and this was accompanied by first increasing 
and then decreasing verbal rate. If heart rate 
is a measure of anxiety, and generalizing to 
the second study as well, it appears that as 
anxiety increases from pretone to postshock; 
verbal rate follows the inverted-U curve. 


Summary for Situational Anxiety 


Looking at Table 1 it is apparent that with 
two exceptions there is a negative relationship 
with verbal quantity in studies where there 
are only two stress conditions, or between 
the lower and higher of three stress condi- 
tions. Where there are three stress conditions 
the inverted-U relationship frequently ap- 
pears. In the only two exceptions to the nega- 
tive relationship between situational anxiety 
and verbal quantity, stress may have been 
moderate rather than severe and so near the 
asymptote of the U curve, causing an in- 


crease in verbal quantity as compared with 
mild stress. 


It appears that mild or minimal stress is 
somewhat to the left of the inverted-U curve's 
asymptote, and severe stress is far to the 
right, so that mild stress results in higher 
productivity than severe Stress, yielding the 
negative relationship between situational anxi- 
ety and productivity that is almost always 
found. 

Five of the rate studies indicate a positive 
relationship between rate and situational 
anxiety, but in one of them the stress situa- 
tion was only moderate, and in four there are 
strong indications that peak rates are at mod- 
erate stress levels, with declining rates as 
Stress either increases or decreases, In the 
remaining four studies, high stress reduced 
rate as it had verbal quantity, The studies 
involving silence are also consistent, with 
high situational anxiety usually increasing 
silence. There is thus a very strong tendency 
for high stress to reduce verbal productivity: 
however measured, and for moderate stress 
to increase it, as compared to mild stress. 


Disposrrton Ar, ANXIETY 


A number of studies attempted to measure 
anxiety through scores on paper-and-penc 
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self-report inventories. Scores on such inven- 
tories appear to represent subjects’ levels of 
readiness to react to situations with anxiety, 
functioning as measures of stress tolerance or 
stress vulnerability, and relating to subjects’ 
dispositions. 

The majority of these studies used the 
Manifest Anxiety Scale (MAS) and one the 
Emotional Responsiveness (E) Scale, both 
originally constructed as measures of drive 
level. Other measures include a self-conscious- 
ness scale, the Cornell Index, and the Chil- 
dren’s Test Anxiety Scale (TAS). With two 
exceptions all studies were of college students 
(including student nurses) or neuropsychiatric 
patients. 

Although studies of dispositional anxiety 
often ignore the stress characteristics of the 
situation in which verbal samples are gath- 
ered, subjects produce verbal behavior under 
conditions of more or less stress. It would be 
instructive if, in studies of dispositional anxi- 
ety, degree of stress could be specified, but 
this is usually impossible. Some of the studies 
took verbal samples of subjects high and low 
in dispositional anxiety under two levels of 
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stress, and verbal productivity of the two 
groups can be compared under the two stress 
conditions. The U-curve hypothesis suggests 
that when verbal samples are collected in the 
usual low stress situation, subjects high in 
dispositional anxiety will be more productive 
verbally than those low in dispositional ansi- 
ety. Under high stress the relationship should 
reverse. Under moderate stress, subjects high 
and low in dispositional anxiety should show 
little difference in verbal productivity. 

A look at Table 2 gives an overview of 
studies of dispositional anxiety. 


Manifest Anxiety Scale Studies 


Using verbatim recordings of TAT stories 
of 23 college students whose MAS scores were 
in the upper or lower 17% of a larger norma- 
tive group, Benton, Hartman, and Sarason 
(1955) found subjects with higher MAS scores 
used more words (2405 to 1882, p = .06), 
had a significantly shorter (more than halved) 
latency and spoke slightly more rapidly. 

Cervin (1956) placed individual college 
students in two sessions with two confederates 
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of the experimenter. In one session the group 
of three chose a TAT card and made up a 
story about it, with the confederates approv- 
ing and encouraging. In the other session the 
group discussed a topic on which the subject 
was known to have a strong opinion, and the 
confederates took the initiative in disagreeing 
strongly with that opinion without waiting 
ior the subject to speak. The 64 subjects were 
equally divided into groups with high and 
low MAS scores. The MAS scores were posi- 
tively related to verbal quantity and nega- 
tively to latency, which is consistent with the 
majority of studies of dispositional anxiety. 
Under the low stress condition, subjects with 
high MAS scores talked more, with less la- 
tency, but the differences were small. Under 
high stress, where the U-curve hypothesis 
predicts a reversal, subjects with high MAS 
scores continued to talk a shade more, and 
with significantly shorter latency. In the high 
stress situation it appears that the subject 
had to be very insistent to get a word in. 
Therefore, latency is a measure of the sub- 
ject’s determination to interrupt the two con- 
federates, and this is quite different than 
situations where latency measures how soon 
the subject breaks a silence. It is not clear 
therefore whether high stress results for dis- 
positional anxiety are inconsistent with or 
merely unrelated to the U-curve hypothesis. 

In the Sauer and Marcuse (1957) study of 
college students overtly or covertly recorded 
while responding to TAT cards, groups with 
high MAS scores spoke less, slower, and with 
shorter latencies. All differences were nonsig- 
nificant, whether broken down by type of re- 
cording or not. 

With a group of 40 mixed neuropsychiatric 
inpatients and outpatients of both sexes, in a 
semistandardized initial psychiatric diagnostic 
interview, Matarazzo, Matarazzo, Saslow, and 
Phillips (1958) reported minimal correlations 
(direction not given) between the MAS and 
their two measures of verbal quantity, pa- 
tients’ units (number of interactions) and 
patients! actions (average duration of pa- 
tients? comments), and a nonsignificant cor- 
relation of —.20 with latency, 

Kanfer (1960) had 38 neuropsychiatric pa- 
tients talk on four topics and measured rate 
in 30-second intervals for each topic. For the 
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initial 30 seconds he found a nonsignificant 
correlation of —.260 between rate and MAS, 
while for the three subsequent 30-second 
periods correlations were even lower, and the 
direction was not given. 

After giving them a short form of the 
MAS, Siegman and Pope (1965b) asked fe- 
male nursing students to talk on two assigned 
topics. Correlations with MAS were .31 (p< 
05) for number of words, —.04 for reaction 
time, and —.15 for silence quotient. 

First obtaining three measures of anxiety, 
Eisenman (1966) recorded neuropsychiatric 
inpatients’ talk time in three psychotherapy 
groups for five consecutive group meetings. 
Scores for anxiety measures were dichoto- 
mized at the median. Significantly more time 
was spent speaking by high- than by low- 
anxiety subjects, with results significant at 
the .01 level for the MAS and Cornell Index, 
and the .05 level for anxiety ratings based on 
figure drawing. On a replication with three 
new groups results were equally significant. 

Preston and Gardner (1967) intercorre- 
lated a very large number of variables, taken 
from 95 college students, including duration, 
number of words, and number of pauses, and 
three anxiety measures. The MAS (short 
form) correlated .19 with duration (just miss- 
ing significance), —.13 with number of words 
used, and —.03 with number of pauses. Audi- 
ence sensitivity correlated —.27 (very signifi- 
cant) with number of pauses, but was not 
significantly correlated with the other pro- 
ductivity measures. Test anxiety correlated 
à nonsignificant .17 with the duration mea- 
Sure, —.08 with number of words, and —,15 
with number of pauses. 


Other Measures of Dispositional Anxiety 


Cervin (1957) employed 30 pairs of sub- 
jects with opposing scores on the E scale, a 
Guttman-type scale of emotional responsive- 
ness, and 30 pairs with equal scores, from a 
pool of 700 college students, Subjects dis- 
cussed topics on which they strongly dis- 
agreed. Subjects having high E scores spoke 
for a longer total time (p< 06 or p< 01 
on the two statistica] tests used) and had a 
significantly shorter latency, 


Levin et al. (1960) used a paper-and-pencil 
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self-consciousness scale, correlated "substan- 
tially” with the Children’s MAS (.61) and 
Sarason’s TAS (.59; Paivio, Baldwin, & Ber- 
ger, 1961). The scale was administered to 48 
ten- to twelve-year-olds. Included also were 
scores on a test of exhibitionism, and whether 
the subject spoke in public to an audience of 


- seven or in private to the experimenter. Sub- 


jects high in self-consciousness spoke for more 
seconds than those low, and though the self- 
Consciousness factor apparently did not reach 
significance there was a significant interaction 
between self-consciousness and stress as ma- 
nipulated by public or private speaking. For 
subjects high in exhibitionism in the low- 
stress situation, subjects high in dispositional 
anxiety spoke an average of 275 words, those 
low 265. For the low in exhibitionism the 
comparable figures are 341 and 231, so for 
both high- and low-exhibitionism groups the 
figures are consistent with the U-curve hy- 
pothesis. Under high stress this relationship 
should reverse, with subjects low in disposi- 
tional anxiety talking more. For those high in 
exhibitionism, subjects low in dispositional 
anxiety spoke an average of 227 seconds, those 
high an average of 208; for the low in exhibi- 
tionism the comparable figures are 168 and 
141, again consistent with the U-curve hy- 
pothesis. 
McCoy (1965) had high and low test- 
de children perform an es- 


anxious fourth-gra r e 
sentially stressful drawing task. Subjects with 


high scores on the Children’s TAS had signifi- 
cantly fewer (3.8 to 5.6) free verbalizations. 
On a presumably less stressful, but certainly 
somewhat stressful drawing task no signifi- 
cant difference was found in number of verbal- 
izations. A lowered degree of stress might have 
put the children low and high in test anxiety 
at the top of the U curve, the high anxious to 
the right and the low anxious to the left of 
asymptote, but both at about the same height 
on the verbal quantity axis. The inverted-U- 
curve hypothesis would predict no significant 
relationship with verbal productivity under 
moderate stress and a significant negative 
relationship under high stress. 


Summary for Dispositional A nxiet y 


. Table 2 shows à definite tendency for dispo- 
Sitional anxiety to be positively related to 


253 


verbal quantity and negatively to silence, 
while rate shows no clear trend. Higher stress 
levels appear to be a factor in some discordant 
results. Overall results show a reasonable 
degree of consistency, but not at the very 
high level found with situational anxiety, 
perhaps because verbal samples were ob- 
tained under varying and not easily specifi- 
able levels of stress. 


CONCURRENT ANXIETY 


In this group of studies it is difficult to 
classify anxiety as dispositional or situational. 
Anxiety, measured by occurrence of ongoing 
speech disturbances or physiological changes 
and so labeled concurrent, is usually corre- 
lated with the verbal productivity variables 
(see Table 3). Subjects are limited to college 
students (mostly student nurses), neuropsy- 
chiatric patients, and children, with one ex- 
ception. 


Speech Disturbance Measures 


Questions have been raised concerning the 
validity of speech disturbances as a measure 
of anxiety (Boomer & Goodrich, 1961; Cook, 
1968; Krause, 1961a, 1961b), and the prac- 
tice has been defended by others (Pope & 
Siegman, 1964; Siegman & Pope, 1965a). 
With two exceptions the speech disturbance 
measure used is Mahl’s (1959) “non-ah” 
speech disturbance ratio. 

The earliest of the studies in this group 
(Mahl, 1959) reported nonsignificant corre- 
lations of .35 between the non-ah speech dis- 
turbance ratio and speech rate for 11 males 
and —.02 for 20 female neuropsychiatric pa- 
tients being interviewed. Krause (1961a) 
used two measures of speech disruption in 
scoring 10-minute recorded samples of 15 
male neuropsychiatric patients, computed in- 
traindividual correlations of both of these 
with verbal quantity, rate, and latency, and 
then reported the median correlations in each 
case. For the non-ah speech disturbance ratio 
these were .28 with number of words and 32 
with rate, both significant, and .01 with 
latency. For the Dibner Cue Count I, a simi- 
lar speech disturbance measure correlating 91 
with the non-ah speech disturbance ratio, 
equivalent correlations Were 20; .39, and 
—.04, the first two significant. 
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TABLE 3 
CORRELATIONS BETWEEN CONCURRENT ANXIETY AND THE VERBAL PRODUCTIVITY MEASURES 
Verbal productivity measures 
Studies ely Subjects i — ad 
Quantity Silencee Rate 

Mahl (1959) Non-ah +.354 
Krause (1961a) Non-ah +.28%e +.01 Le +.32%¢ 
Feldstein (1962) Non-ah 30 NP! F.03* 
Pope & Siegman (1962) Ah and Non 8 NP +.24** 
Pope & Siegman (1964) Non-ah 30 SN +.50** =27 LP 
Siegman & Pope (1965a) Non-ah ~.07 LP 
Pope & Siegman (1966) Non-ah +.30* —.09 LP 
Pope & Siegman (1968) Non-ah 32 SN +.36* —.38* Pe , 
Levin & Silverman (1965) SC/Word 48 Child —.17h mom — 36*i 

Rep/Word —.24h 4-.56**pi — J0 
Levin et al. (1967) Hes/Word 24 Child +17 4-.54**pk —.609* 
Bernick & Oberlander (1968) Pup. Siz. 12 Coll. Mt 
Pope & Siegman (1964) GSR 30 S! +.40* —.19 PL 
Innis et al. (1959) Bl. Pres. 1NP +* 


Note.—Where directional sign only is given, 
direction of relationshi| 

a Non-ah 
tions per word; Si 
Pup. Siz. = pupil 

SNP = ps 


patients, 
ignificant. 
significant posit 
econd period ; negligible corre! 


* For total number of pauses, .48 with 
T puc 
=p <o 


With 60 male patients, half hospitalized 
for schizophrenia and half for nonpsychi- 
atric disabilities, telling stories to four sets of 
cartoonlike strips of five pictures, Feldstein 
(1962) found a very significant correlation of 
.33 between number of words and non-ah 
speech disturbance ratio. In a study of eight 
transcripts of initial interviews with neuro- 
psychiatric patients, Pope and Siegman 
(1962) found intraindividual correlations be- 
tween speech disturbance (ah and non-ah 
combined) ratio and number of clause units 
ranging from —.01 to .50, with a median 
correlation of .24, and all but two very sig- 
nificant, 

In four studies Pope and Siegman used stu- 
dent nurses as subjects. In the first, speech 
samples were from stories told to TAT cards. 
The remaining three involved experimental 
analogues of initial interviews. Pope and Sieg- 
man (1964) with a sample of 30 obtained a 
very significant correlation of .502 between 


actual correlation was not reported or study w; 


p. " i 
on-ah speech disturbance ratio; Ah and Non = Ah + Non-ah speech disturbance rati 
Word = Sentence corrections per word; Rep/Word = repeti 
; Bl. Pr 


ilence under one second, second period ; for first period „30, 


às not correlational, Sign show? 


4 Hes/Word = hesita: 
etitions per word; GSR = galvanic skin response» 


3; Coll. = college students, 
/here both are indicated, the corre! 


nship for 20 females, 
t -+.20 for quantity, —.04 for latency, 4-,39* for rate. 


lation listed is the higher of the (wor 


/ Word for second period. 


9 with number of pauses Der word, 


non-ah speech disturbance ratio and number 
of words used, a nonsignificant correlation of 
—.273 with latency, and a negligible correla- 
tion with silence quotient. Siegman and Pope 
(1965a) with 50 Subjects found negligible 
negative correlations between non-ah speech 
disturbance ratio and both silence quotient 
and latency, Using a very similar design 
(Pope & Siegman, 1966) a significant corre- 
lation of .30 was obtained between non-ah 
speech disturbance ratio and number of words, 
but negligible negative correlations between 
speech disturbance ratio and both silence quo- 
tient and reaction time, With 32 subjects 
(Pope & Siegman, 1968) there was again 4 
significant (361) correlation between non-ah 
speech disturbance ratio and number of words, 
but surprisingly a significant one, also pos! 
tive, of .381 with silence quotient, and a non- 
significant one of .293 with latency. 

Levin and Silverman (1965) asked 58 fifth- 


grade children to give stories in response t° 
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sentence stems. They recorded two speech 
disturbance measures—sentence corrections 
per word and repetitions per word; two mea- 
sures of verbal quantity—number of words 
and number of seconds talked; two silence 
measures—silences under and over one second 
in duration; and rate. They derived scores 
for each of two periods in which the children 
gave their stories, yielding a total of 20 cor- 
relations, All the correlations between speech 
disturbance and verbal quantity were non- 
significant, but the four first-period correla- 
tions are negative and relatively high (—.14 
to —.24) while the four second-period cor- 
relations are in two instances lower but in the 
same direction (—.09, —.01) and in the other 
two instances positive (.05, .22). Of the four 
correlations with rate, three are negative 
(—.19, —.20, —.36), the last significant, 
and one is a negligible .01. Of the eight cor- 
relations with silence, all but one are positive, 
half significantly so. 

The assumption that verbal quantity bears 
an inverse U-shaped relationship to anxiety 
can be used to reconcile the nonsignificant, 
sometimes negative relationship found be- 
tween concurrent anxiety and verbal quantity 
in the last study with children, and the uni- 
formly significant positive relationship found 
in all of the studies with adults. A child 
asked to tell a story to one or more strange 
adults should experience more stress than an 
adult. If the stress is relatively high this 


could move the whole interview to the right 


side of the U curve, yielding a negative rela- 


tionship between anxiety and verbal quan- 
tity. Assuming that an initial performance 
under these conditions would be more up- 
setting than à second one, if the correlations 
for the first session were negative, those for 
the second session would decrease in size or 
become positive, which was what happened 
with each of the four correlations involved. 
Levin et al. (1967) selected three boys and 
three girls randomly from each of the four 
alternate grades from kindergarten to sixth. 
Subjects were shown three simple science 
demonstrations and asked to describe what 
The speech disturbance 


happened and why. 
the ah 


measure used was quite similar to 
and non-ah speech disturbance ratio. Correla- 


tion with number of words used was .17 and 
with speech rate a very significant —.66. 
Correlation of speech disturbance was 54 with 
total number of pauses, .48 with total time 
pausing, and .79 with number of pauses per 
word, all very significant. The verbal task 
may have been less stressful in this study 
because the children were given a concrete 
demonstration to talk about. Also they talked 
only to the experimenter, whereas half the 
stories in the earlier study (Levin & Silver- 
man, 1965) were to an audience of seven 
adults. Assuming that talking before adults is 
less stressful for adults than for children, that 
a first interview for children is more stressful 
than a second, and telling a story more stress- 
ful for a child than talking about something 
he has just seen, then there is a consistent 
and understandable trend in the size and 
direction of correlations between concurrent 
anxiety and verbal quantity for adults (rela- 
tively large and positive), for children telling 
a story in a second session (close to zero), 
and for children telling a story in a first 
session (relatively small but minus). This 
trend would be predicted from an inverted-U 
relationship between anxiety and verbal 
quantity. 

For the speech disturbance studies, correla- 
tions of silence with speech disturbance are 
not, as might be expected, mostly negative 
and so in the reverse direction to the positive 
correlations between verbal quantity and 
speech disturbance. Two of the three studies 
with significant positive correlations between 
speech disturbance and silence are of chil- 
dren, for whom the situation may have been 
more stressful. Speech is a disfluency in the 
same sense as speech disturbance ratios. As 
such, those factors which increase other dis- 
fluencies appear likely to increase silence, 
resulting in positive correlations between 
silence and speech disturbances, or reducing 
the size of negative correlations. 


Physiological Measures 
Three studies used concurrent physiological 
asures which are often considered indica- 
In the fist, Bernick and 
asked 10 male and 2 
to talk about anything, 


me: 
tive of anxiety. 
Oberlander (1968) 
female collegians 
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think about various subjects, and talk about 
a problem. Pupils dilated very significantly 
more while subjects were talking than while 
thinking, consistent with Strahan’s (1968) 
finding that any speech raises both heart rate 
and skin conductance compared with the non- 
speaking state. Talking, per se, appears some- 
what stressful, though below the mid- 
point of the stress axis and thus to the left 
of the asymptote of the inverse-U curve 
relating anxiety and verbal quantity. 

The second study (Pope & Siegman, 1964) 
found galvanic skin Tesponses of student 
nurses correlated a significant 4 with num- 
ber of words used in TAT responses, but 
nonsignificantly with reaction time (—.05) 
and silence quotient (—.19). In the last 
study (Innes, Millar, & Valentine, 1959) of 


TABLE 4 
SUMMARY or Sruptes SHowrne SIGNIFICANT RELATION- 


SHIPS BETWEEN ANxIETY AND VERBAL 
Propuctivitry VARIABLES 


Verbal Productivity variables 


Type of Direction of —ÀÀ 
anxiety relationship P 
Quantity | Silence Rate 
— — EB 
Dispositional + 4 0 0 
-= 1 3 0 
Concurrent + 8 3 1 
-— 0 0 2 
Situational 4 T 4 3 
= 14 2 2 
Note.—A study is counted only once within a type of anxiet y. 
even though it may have used two Measures of that type. If 
wo scales are used, one Eing significant, the other non. 
significant results, the Study is included. 
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and concurrent anxiety and a negative one 
with situational anxiety. Silence seems to work 
in a way opposite to verbal quantity for dis- 
Positional and situational anxiety, but for con- 
Current anxiety the relationship to silence is 
more often in the same direction as that to 
verbal quantity, which may be due primarily 
to results in studies with children or to the 
natural correlation between silence and other 
speech disfluencies. The nature of the rela- 
tionship between rate and anxiety is not clear 
from the summary table. The reviewer feels 
that close examination of the pertinent studies 
Suggests rate is related to anxiety in the same 
way as verbal quantity. The summary table 
could support the view that there are three 
anxiety factors, each relating somewhat dif- 
ferently to verbal Productivity indexes, 


The U-Curve Hypothesis 


The data ca 


n also be seen as suggesting 
that as stress 


is increased from minimal or 
te, situational anxiety increases 
verbal productivity, reaching 
asymptote on the inverted-U curve somewhere 
in the moderate Tange. As stress 
beyond moderate 


mately equal. T 
tional anxiety are j 
mediate point between moder, 


58 groups. 

differences between 
dispositional anxiety 
the relationship be- 
al Productivity and dispositional 


Verbal Productivity 
Sroups low and high in 


do not appear great, and 
tween verb 


> a 


„verbal quantit. 


| 
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anxiety is usually low, though often signifi- 
cant. Under conditions of mild stress most 
subjects would be ranged on the left or as- 
cending shoulder of the inverted-U curve, 
with those low in dispositional anxiety tend- 
ing to be further down the slope than those 
high, yielding the írequently found higher 
verbal productivity in the high dispositional 
anxiety group. In a moderate stress situation 
most subjects would be ranged around the 
asymptote of the inverted-U curve with those 
low in dispositional anxiety tending to cluster 
to the left of asymptote, those high to the 
right, thus resulting in no significant. group 
differences in verbal productivity. In a severe 
stress situation most subjects would be ranged 
to the right of asymptote on the curve, with 
those high in dispositional anxiety further 
down the right side of the curve, thus result- 
ing in their showing lower verbal productivity. 
The reason that most studies of dispositional 
anxiety result in positive correlations with 
verbal productivity, or higher mean verbal 
productivity scores for groups high in dis- 
positional anxiety, is probably that in the 
majority of such studies the verbal produc- 
tivity measures have been collected under 
conditions of minimal stress. 

The generally positive correlations between 
y measures of verbal produc- 
xiety also seem to be 
to sample verbal pro- 
s of minimal subject 
concurrent anxiety 
o be a function of 
and anxiety, and 
to use in studying 
ivity to anxiety. 


tivity and concurrent an 
a result of the tendency 
ductivity under condition: 
stress. In the typical 
study, silence appears t 
both verbal productivity 
thus not a good measure 


the relation of verbal product ets 
Both Schlosberg (1954) and Leuba (1955) 


discussed the idea that emotions can be either 
organizing or disruptive, pleasant or unpleas- 
ant, and that moderate stimulation or activa- 
tion may be maximally effective in learning 
or performance. Fiske and Maddi (1961) pre- 
sented a theory of the inverted-U shaped 
curve of performance as a function of activa- 


iggested that this curve is descrip- 


tion, and su! 
tive of performance on almost any task. Below 


maximal performance is seen as a result of 
so little activation that attention and con- 
centration are impaired, or of so great activa- 
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tion that there is distracting tension and 
hyperactivity. Hebb (1955) has also dis- 
cussed what appears essentially to be an 
inverted-U curve, rising to an optimal level 
of response with increasing alertness and then 
declining as cortical arousal and anxiety 
increase yet further. 

As applied to anxiety and verbal productiv- 
ity, anxiety would be seen as either a syno- 
nym for activation or an indication of high 
activation, and verbal productivity as one 
type of performance. Anxiety is an inter- 
vening variable and verbal productivity a 
response variable. The stress level of the situ- 
ation and the stress tolerance or stress vul- 
nerability of the organism are independent 
variables, working perhaps in some simple 
additive fashion to determine the level of 
anxiety. The relationship between either of 
the independent variables and anxiety appears 
linear (the higher either stress or stress vul- 
nerability, the higher the anxiety), while 
that between anxiety and verbal productivity 
appears curvilinear. 

Generalizability of conclusions is somewhat 
limited due to overwhelming use of college 
students, neuropsychiatric patients, and less 
often, children as subjects, almost ignoring 
noncollegiate, mentally healthy adults. Gen- 
eralizability is also limited due to the paucity 
of studies from other cultures and indeed from 
languages other than English. 


The Box-Score Effect 

Gardner (1966) has raised the possibility 
that the use of box-score-type tables may be 
misleading due to a tendency of experiment- 
early in the exploration of an 
area in psychology, to publish primarily sig- 
nificant results and because the score keeper 
can compare a limitless number of hypotheses 
to the data, thus capitalizing unfairly and 
misleadingly on chance. He pointed out that 
as positive results pile up more skeptical 
researchers are drawn into the area, and it 
becomes more profitable to publish negative 
results. Therefore one way to test for the 
box-score effect is to see whether there is a 
elationship between date of the 


ers, particularly 


negative T 
study and results. 


For verbal quantity and situational anxiety, 
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if results before and after 1963 are compared, 
six significant negative and three other re- 
sults are found before, nine significant nega- 
tive and three other results are found after 
1963. No box-score effect is present. 

'There are only nine studies of silence and 
situational anxiety. Before 1964 there were 
two reporting a significant positive rela- 
tionship and two others; after 1964 the 
comparable figures were two and three. 

Of 10 studies of concurrent anxiety and 
verbal quantity, the only 2 not showing a 
significant positive relationship are those of 
the Levin group at Cornell, using child sub- 
jects, and both recent (1965 and 1967). 
However, the two 1968 studies both show 
significant positive results and the deviant re- 
sults of the Cornell group can as readily be 
ascribed to their child subjects as to the 
box-score effect. There are only six studies 
of concurrent anxiety and silence, five yielding 
significant positive results, so no trend can be 
seen. 

For 10 correlations of dispositional anxiety 
with verbal quantity through 1958 there are 
two significant positive relationships and three 
others; after 1960 there are also two signifi- 
cant positive results and three others, The 
data on dispositional anxiety and silence 
show a strong though not statistically signifi- 
cant trend with three studies coming in 1955, 
1956, and 1957 showing significant negative 
results and four studies coming in 1957, 1958, 
1965, and 1967 showing nonsignificant results, 
but all still negative. 

The reviewer’s interpretation of this analy- 
sis is that the relationships found between 
verbal productivity and anxiety cannot be 
attributed in any significant degree to the 
box-score effect. 

For the future, research is needed using 
four or more levels of stress arousal and four 
or more levels of dispositional anxiety, since 
studies with dichotomized variables often 
mask curvilinear relationships. Studies of con- 
current anxiety, with verbal samples collected 
under low, moderate, and high stress condi- 
tions would be informative. Studies using 
curvilinear correlation coefficients to relate 
anxiety and verbal productivity would also be 
crucial in further investigations of the rela- 
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tionship between verbal productivity and 
anxiety, 
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METHODOLOGICAL CONSIDERATIONS RELEVANT TO 
DISCRIMINATION IN EMPLOYMENT TESTING 
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Wayne State University 


Test discrimination can be defined as an over- or underprediction of criterion 
scores for different subgroups of job applicants. It is shown here that one must 
take into account differences between subgroups with respect to test-criterion 


correlations, criterion 


means and variances, and differences in standard errors 


of estimate if one is to avoid unfair discrimination. In addition, utility assump- 
tions, which play an important part in any selection strategy, are considered 


here with respect to test discrimination. 
all the relevant information concerning 


A method is developed which utilizes 
differences between groups to arrive 


at a nondiscriminatory procedure for selection. 


A problem of current concern to the per- 
sonnel psychologist is racial or ethnic group 
discrimination in employment testing. There 
are now numerous guidelines and policy state- 
ments issued by local, state, and federal agen- 
cies concerning the appropriate use of tests 
for selection purposes. A recent federal execu- 
tive order (Department of Labor, 1968) re- 
quires all companies receiving federal monies 
to provide evidence of the validity of employ- 
ment tests and to take affirmative action to 
see that such tests are not used to discriminate 
against minority groups. 

Unfortunately, the issues involved in test 
discrimination have not been clearly spelled 
out, and it is not even clear exactly what 
constitutes discrimination in employment test- 
ing. The purpose of the present paper is three- 
fold: (a) to attempt to delineate the meaning 
and usage of the term “discrimination” as 
applied to employment testing, (2) to illus- 
trate several different ways in which uninten- 
tional test discrimination might occur, and (c) 
to suggest a method for using selection tests 
in order to avoid unfair discrimination while 
at the same time obtaining the most desirable 
candidates for a job. While many of the points 
made here are not entirely original with us, 
we feel that the present exposition is impor- 
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tant, since these issues have not previously 
been systematically discussed and integrated 
with respect to their implications for non- 
discriminatory use of selection tests. 


Tur MEANING OF TEST DISCRIMINATION 


Test discrimination has frequently been 
thought of in terms of systematic (mean) dif- 
ferences in test scores between clearly defined 
groups. Thus, if a particular test consistently 
shows lower average scores for Negroes than 
for whites, it is often thought that the test 
systematically discriminates against Negroes. 
As Anastasi (1968) has pointed out, this is 
an unsatisfactory Way of defining test dis- 
crimination. In the first place, it is quite likely 
and to be expected that any variable will show 
some differences between almost any specified 
groups. Second, and most important, evidence 
of real differences between groups on à test 
is not, in itself, prima facie evidence of test 
discrimination. That is, such evidence does 
not necessarily imply that the fest is discrim- 
inatory with respect to these groups, but 
rather that different underlying factors may 
be responsible for score differences between 
the groups. Thus, if a test of verbal compre- 
hension systematically discriminated between 
large samples of Negro and white children, 
such that whites have consistently higher 
han Negroes, then it is incum- 


mean scores t i 

bent upon the test critic to demonstrate that 
the test, qua test, is discriminatory. In such a 
case, & reasonable argument could obviously 
be made that such factors as cultural back- 
grounds, educational opportunities, and family 
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attitudes toward reading and verbal behaviors 
are largely responsible for differences between 
the groups on test scores. As Anastasi (1968) 
has suggested, it would then be no more 
meaningful to speak of the test, qua test, as 
discriminatory than it would be to speak of 
a thermometer which showed Systematic dif- 
ferences in body temperatures between influ- 
enza victims and healthy individuals as a 
“discriminatory” thermometer, Thus, the dis- 
crimination that may exist lies not so much 
in the test or measuring instrument as in the 
background variables that the test or measure 
reflects. 

Another and somewhat more sophisticated 
definition of test discrimination is usually 
couched in terms of differences between 
groups in the relationship between test and 
criterion scores. Thus, for example, if a par- 
ticular employment test shows differences in 
validity for a given criterion measure between 
Negro and white job applicants, then the test 
can be said to be discriminatory in nature 
with respect to these racial groups if this dif- 
ference is not taken into account. Anastasi 
(1968) referred to this kind of test discrim- 


ination as test bias, and defined bias in the 
following way; 


In the psychometric sense, test bi. 
prediction or underprediction of criterion measures, 
If a test consistently underpredicts criterion per- 
formance for a given group, it shows unfair discrim- 
ination or “bias” against the group [p. 559] 


as refers to over- 


—Group B 


Combined 
Group 


CRITERION 


— Group A 


TEST 


- 1. Groups with equal Tegressions but diff 


erent 
criterion means, 
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Anastasi gave an example of test bias which 
is reproduced in Figure 1. As can be seen, 
Groups A and B have identical regression-line 
slopes and identical means on the predictor, 
but Group B has a higher criterion mean. If 
the two separate groups were pooled and 
treated as one group, the resulting regression 
line for the combined group would be shown 
in Figure 1. If the combined group regression 
line were used for Predicting criterion scores 
for all persons, then criterion scores for Group 
B would be consistently underpredicted, and 
therefore this group would be discriminated 
against by the test, 

Another definition of test discrimination is 
given by Guion (1966). He made a distinction 
between fair and unfair test discrimination, 
pointing out that “Unfair discrimination ex- 
ists when persons with equal probabilities of 
success on the job have unequal probabilities 
of being hired for the job [p. 26]." There 
are, of course, Many ways in which unfair 
discrimination in this Sense can occur. It 
seems clear, however, that one basis for un- 
fair test discrimination would be the case in 
Which job Success for members of one group 
was consistently underpredicted, as in 
preceding illustration, Thus, Anastasi's defini- 
tion of test bias may be seen as a special 
case of Guion's more general definition of test 
discrimination, 

A third, similar definition 
ination is offered 


currently is used to 
criterion scores are “inappropriately” or “in- 
correctly” predicted 


than to 


some property or characteristic of the test 


itself, 

It should also be 
tests are used appro 
discrimination į 


ures themselves are 
with respect to different 


—— 
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subgroups. However, for the present discussion 
we are concerned about the use of tests with 
respect to unfair discrimination assuming an 
acceptable criterion is given. | 


DIFFERENTIAL VALIDITY 


A In addition to the concept of test discrim- 
ination, Kirkpatrick et al. also discussed the 
notion of differential validity of tests. Here 
they referred to the case in which a test has 
different validities for different groups. Differ- 
ential validity would occur when a test was 
valid for one group but not for another, when 
the validity was considerably (and presum- 
ably, significantly) higher for one group than 
for another, or when the validities for dif- 
ferent groups were considerably different from 
one another (e.g. positive for one group and 
negative for another). Of course, this is basi- 
cally the case of moderated prediction (cf. 
Saunders, 1956) in which the moderator 
variable is group membership. 

Differential validity and test discrimination 
are, as Kirkpatrick et al. pointed out, con- 
ceptually different and independent concepts. 
Tests may be differentially valid and yet not 
be used in a discriminatory manner, and tests 
may show no evidence of differential validity 
and yet be used in a highly discriminatory 
fashion. Thus, if à moderated relationship 
(differential validity) does not occur for two 
groups (e£. Negroes and whites) it is still 
possible for unfair discrimination to occur, 
if the test scores are used improperly (cf. 
Figure 1). In the sample represented in Fig- 
ure 1, there is no differential validity, since 
the two groups differ only with respect to the 
criterion means. If a combined regression 
equation were used here, however, discrimina- 
tion would occur. On the other hand, when 
differential validity does occur, then discrim- 
ination is also likely unless, again, one is care- 
ful to use the tests properly by using sepa- 
rate and appropriate prediction procedures for 
each group. The implication here is that it 
is always necessary to investigate the pos- 
sibility of different regression. functions for 
different groups since otherwise, one might 
be guilty of discrimination In the use of tests 
if the same (and inappropriate) regression 
function were used for all groups in à selection 
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Fic. 2. Groups with different regressions, predictor 
means, and criterion means. 


procedure where different regression functions 
were, in fact, justified. 

Figure 2 illustrates the case of a test which 
might be used to discriminate against each of 
two different groups if a combined regression 
line were employed rather than separate 
regressions for the two groups. 

Assuming that the groups here are white 
and Negro, the correlation. (validity coeffi- 
cient) for the Negro group is .40, while the 
validity for the white subgroup is 00. How- 
ever, in this example, the white subgroup has 
a higher mean on the predictor as well as on 
the criterion measure. The combined regres- 
sion line for the total group is also shown. 
If one were to use the combined regression 
line for predicting criterion scores for both 
groups, one would be discriminating against 
Negroes and whites. A Negro scoring at 
Point P on the test would be predicted to have 
a criterion score at V», whereas his predicted 
score using the regression line for his group 
indicates à predicted score at Yı. Similarly, 
for the white who scores at Point O on the 
test, his predicted criterion score using the 
regression line is at Point Ya 


combined 
hould be at Point Ys. 


whereas it S 


 — aAa 
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OTHER SOURCES oF DISCRIMINATION IN THE 
Use or TESTS 


Tn addition to the possibility that differen- 
tial test validities may lead to discriminatory 
use of selection tests, there are other ways 
in which test-criterion relationships may differ 
for different subgroups which also may lead 
to discriminatory test practices. For example, 
in discussing moderators or control variables, 
some authors have stressed the notion of 
differential predictability (e.g., Banas, 1964; 
Ghiselli, 1956, 1960, 1963; Hobert & Dun- 
nette, 1967). In this approach, an attempt 
is made to identify subgroups of individuals 
who are more predictable and those who are 
less predictable in terms of a particular test- 
criterion relationship. That is, one attempts 
to develop or find variables which will differ- 
entiate persons whose actual criterion scores 
lie closer to the regression line from those 
whose criterion scores show greater deviations 
from the regression line. Such a variable may 
be considered to be a moderator variable since 
it would follow that the correlation between 
test and criterion would be higher for one 
group identified by the moderator variable 
(the predictables) than it would for the other 
group. 

In this connection it is important to note 
that while the validity of a test may be 
higher for one group than for another, this 
does not necessarily imply that the criterion 
scores of the group with the higher validity 
coefficient can be predicted more accurately 
than those of the group with the lower valid- 
ity (cf. McNemar, 1969). That is, it is not 
necessarily the case that the subgroup for 
Which the test-criterion validity is highest 
also has the smallest standard error of esti- 
mate. This is because the subgroups which 
differ in validity may also differ in the vari- 
ability of their criterion scores. To the extent 
that such differences in variability occur, the 
errors of estimate may also differ, so that 
differences in validity coefficients alone are 
not necessarily appropriate indexes of the 
accuracy with which criterion scores can be 
predicted, 

From elementary regression and correlation 
theory, it will be recalled that the correlation 
coefficient is a direct (inverse) index of accu- 
racy of prediction for Predicting standardized 
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tween test and criterion, the more accurately 
can one predict standardized criterion scores— | 
the less variability of criterion scores about 
the regression line. However, when one is con- 
cerned with prediction of raw scores, the cor- 
relation coefficient is no longer a direct index 
of accuracy of prediction, Rather, it is à 
function of the variability of the criterion 


criterion scores from standardized predictor 
scores. Thus, the higher the correlation be- 
Scores as well as of the standard error of esti- | 


common practice of assuming that because 
two (or more) groups differ in their test- 
criterion validities, these validity differences 
are the only relevant differences with which 
the personnel selection researcher need be con- 
cerned; and that these validity differences 
indeed reflect differences in the accuracy with 
Which criterion scores for members of the 
different subgroups can be predicted. It is the 
contention of the present study that differ- 
ences in errors of estimate for different sub- 
groups are also of concern to the personnel re- 
searcher, and that these differences are rele- 
vant to both the concepts of differential valid- 
ity and unfair discrimination to which we 
have earlier referred. In fact, we would further 
argue that in terms of the issue of test dis- 
crimination, it is primarily differences between 
different groups in standard errors of estimate, 
rather than differences in test-criterion cor- 
relations, which will be of most relevance. 
When race is the moderator variable of inter- 
est, it is usually the case that predictor and/or 
criterion means and variances are different for 
the different racial groups (cf. Bartlett & 
O'Leary, 1969). Tt follows, then, that the 
test-criterion correlation coefficient, which is 


ate for the two groups | 
are the same. 
The problem to which we refer here is the 
l 
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an index of accuracy of prediction of stan- 
dardized scores (where group means and vari- 
ances are equated), is less meaningful than 
the standard error of estimate, which is 
expressed in raw-score terms. 

In personnel selection, the objective is not 
only to accept those persons who are predicted 
to be above a certain minimum point on the 
criterion, but also to make these predictions 
with as high a degree of confidence as possible. 
This means that it is necessary for the person- 
nel psychologist to take into account not only 
an individual's predicted criterion score, but 
also, and quite importantly, the probability 
with which this score will fall below or beyond 
some criterion cutting point. In terms of 
avoiding unfair test discrimination, this means 
that one wants to take into account the 
probability of success associated with a given 
test score for each of the different subgroups; 
rather than merely the predicted criterion 
scores. This accords well with Guion's defini- 
bias in terms of taking into ac- 


tion of test 
count the notion of probability of success on 
ith this problem 


the job. In order to deal w 
it is essential that the standard errors of 


estimate for the various groups be taken into 


account. 


For example, Figure 3a (left) shows à 
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situation in which two groups have identical 
regression lines but differ in their error of 
estimate. The test-criterion correlations here 
might be identical or quite different for the 
two groups, depending upon differences in 
their criterion score variability. For a per- 
son scoring at Point X on the test, the prob- 
ability of scoring above the criterion cutoff 
point is greater for the person from Group 2 
even though his predicted criterion score is 
actually the same as that of the person from 
Group 1. In this situation, the person from 
Group 2 would be preferred. 

In Figure 3b (right), the two groups differ 
in both slope of regression line and variability 
of scores about the regression line. In this 
the standard error of estimate for 
Group 1 is given as being 21 times as large 
as that for Group 2. Interestingly enough, the 
test-criterion correlation here is given as being 
considerably larger for Group 1 than for 
Group 2 (even though Sy.» is smaller for 
Group 2); implying, of course, considerably 
greater criterion variability for Group 1 than 


for Group 2. ' 
on, the question as to which 


In this situati c 
person to hire when two individuals have the 
same predictor score (X), but are from dif- 


ferent groups, becomes à problem in utility 


case, 


2.60; 9 = 10.0 
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theory (cf. Cronbach & Gleser, 1965). If one 
assumes a dichotomous utility scale on the 
criterion, the person from Group 2 should be 
hired since he is more likely to be above the 
cutting point on the criterion. However, if one 
assumes a continuous utility scale underlying 
the scores on the criterion, one might want 
to select the person írom Group 1, 


even 
though he has a higher probability of being 
below the criterion cutting point, since his 


higher predicted criterion score might imply 
a greater utility than the criterion score for 
the individual from Group 2, that is, 


p(Vi > ¥.)-U(Y¥1’) 
p > PYY > Y) UYY), [1] 
where 
Y,’ = predicted criterion for Group 1, 
Y? = predicted criterion for Group 2, 
Y. = cutoff point on criterion, 
U(Y7) = utility of predicted criterion score 
for group i. 


This inequality would hold even though the 
probability of a score being above the cutoff 
is less for the individual from Group 1 as 
long as the utility of V1’ is sufficiently higher 
than the utility for Yy, Utility assumptions, 
while not often stated or even recognized, still 
play a significant any selection 


With which he 
basis of this 
elative utility 
different criterion States, 


SUGGESTIONS ron Use or Test 


S TO Avorp 
DISCRIMINATION 


Although recognizing discriminati 
problem, deciding what to do 
to be a more difficult 


on is one 
about it seems 
problem. As Kirkpatrick 
"Solving problems of 
found to exist, is more 
is ultimately desired 
cedure for all appli- 
n-minority [p. 19." 
problem of test dis- 
ocated here, involves 
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the use of separate cutting scores on the 
predictor tests so that the criterion means, 
the different regression lines, and standard 
errors of estimate are used to arrive at a 
nondiscriminatory procedure. Such a proce- 
dure is outlined below. The procedure de- 
Scribed here follows, in part, the technique 
outlined in Magnusson (1967). 

Figure 4 shows a selection variable and a 
criterion measure for one group. The distribu- 
tion within the array is assumed to be normal. 
The score on the predictor is denoted by X, 
the predicted Score on the criterion as Y^ 
and the criterion Cutoff score as y, The 
Shaded portion of the tail of the distribution 
within the array shows the Proportion of indi- 
viduals with a Predictor score at X who will 
fall below the criterion cutoff point. The prob- 
lem becomes one of finding a cutting score 
on the predictor so that the predicted criterion 
score will be above the criterion cutoff with a 
designated probability, The predicted criterion 


Score can be expressed in terms of the raw- 
Score regression equation: 


> Su S, 
Per EE (u, Se a) [2] 
Solving for the predictor Score, X, we 


y- SE" = M,) 
i 15, 


find 


T. [3] 

In order to find y’ (and X) it is necessary to 
specify a z score Corresponding to the desig- 
i s willing to take that a person 
will be below the criterion cutting score, This 
Z score can be expressed in terms of the sub- 
distribution of criterion scores for the predic- 
as the mean of this 
distribution and designating Y, as the cri- 
i ch one will not con- 


S as satisfactory, Then, desig- 
nating the required z score as Lii 


The standard error of estimate is used as the 
standard deviation in calculating Zp since this 
is the standard 


deviation within each of the 
arra y 
arrays. V’ can now be expressed as, 


Y'a 


-FE- Zy(Sy2), 
where Zp can be de 


[5] 
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x 
TEST 


Fic. 4. Regression of test and criterion showing 
designated risk level. 


desired risk. It should be noted that Z, will 
be negative for any risk less than .50 (since 
Y, will be below the mean within the array). 
After Z, has been specified by the decision 
maker, Y' can be found and substituted in 
Formula 3 to find the appropriate cutting 
score on the predictor variable. This procedure 
should then be used for each of the separate 
groups. The degree of designated risk should 
be specified as the same for the two groups; 
but the cutting score on the predictor should 
be obtained by using the correlations and stan- 
dard deviations separately in Formulas 3 and 
5 for each specific group. It is assumed that 
the criterion cutting score will be the same 


for the two groups. 

An example is used to illustrate the general 
method. Table 1 gives some hypothetical data 
that might be obtained for two separate 
groups, Negroes and whites. ; 

We will assume that the cutting point on 
the criterion is y — 8. We will also assume 
that the risk we are willing to take on any 
employee is .20, that is, using our procedure, 
we will obtain 80% successful employees 
(ie. employees who score above the criterion 
cutting score of 8 with 8 probability). Sub- 
stituting the values the table in Equations 

: ng score on the pre- 


E 3 we find a cutti 
dos of 23 55 for the Negro group and 32.66 
ictor 3.59 


for the white group. The reason for this can 
be seen in the large differences between the 
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standard errors of estimate in the two groups. 
For the Negro group, the predictor score of 
23.55 gives a predicted Y' = 12.49 which also 
has a .20 probability of falling below the cri- 
terion cutting score. We assume that there is 
an underlying dichotomous utility scale for 
the criterion variable, which affects the 
selection strategy that is used. 

If one also assumes that the predictor scores 
are normally distributed, one can find the 
probability of any group member scoring at 
or above the predictor cutting point. For the 
Negro group the mean on the predictor is 15.0 
with a standard deviation of 8. The 2 score 
corresponding to the cutting point on the 
predictor (23.55) is 1.07, so that approxi- 
mately 14% of the Negroes who take this test 
will score at or above this predictor cutting 
score (given the assumption of normally dis- 
tributed predictor scores). Similarly, the prob- 
ability of a white scoring above the predictor 
score of 32.6 is found to be approximately Ns 
Thus, about 10% of the whites will score at 
or above the predictor cutting score necessary 
to have a predicted criterion score which has 
an 80% chance of being above the criterion 
cutting score. 

Some might argue that the procedure we 
have outlined is discriminatory against whites, 
since whites with higher predicted criterion 
scores than Negroes are not being hired. 
However, given the assumption of a dichoto- 
mous utility scale we have maximized the 
utility for the organization with the informa- 
tion we have. If the utility scale is assumed 
to be continuous, then à different strategy 
could be used. The problem then becomes one 
of using judgment to decide the relative 
weight one is willing to place on the proba- 
bility of falling below a certain score versus 


TABLE 1 


STANDARD DEVIATIONS (SD), AND 


Means (M), 
LATIONS (r) FOR NEGRO AND 


CORRE! 
Waite GROUPS 
p ssi E 

Gin M SD r 
Whites X 220 $, = 10 

Y -10 Sy=8 rey = 40 
Negroes S:=8 

Sy = 6 ray = 10 


the weight one gives to the magnitude of the 
predicted score. For the continuous utility 
case Equation 1 can be generalized so that 


au 7 )U(Y1) 

YY > Ye) o y se youre D 
In this situation the question mark denotes 
that there may be an equal, less than, or 
ereater than sign in the equation depending 
aa one’s judgment concerning the relative 
value of a high probability of success for a 
lower predicted score and a lower probability 
of success for a higher predicted criterion 
score. 

In the case where the standard errors of 
estimate are equal, no problem occurs in the 
continuous or dichotomous utility case. Here, 
individuals with the highest predicted cri- 
terion scores are selected since probability 
of success will be higher for the higher scores. 
It is only in the continuous case where the 
standard errors of estimate are different that 
the method developed here needs to be sup- 
plemented by the judgment of the decision 
maker. 


CONCLUSIONS 


In terms of the concept of test discrimina- 
tion, the various definitions examined here 
agree that discrimination refers to cases where 
criterion scores for some groups are “incor- 
rectly” predicted by tests. In addition, as we 
have shown, test discrimination can occur 
when there are differences between groups 
with respect to criterion means, validity co- 
efficients, and/or standard errors of estimate 
when these differences are not taken into ac- 
count by the decision maker. These param- 
eters should always be examined with respect 
to the various groups and separate regression 
equations used if the groups vary on these 
parameters, 

In keeping with Guion’s definition of un- 
fair discrimination it is also shown that the 
standard error of estimate should be incorpo- 
rated into the decision strategy. This assures 
that one will be taking probability of success 
into account as well as level of success. The 
use of the standard error of estimate also in- 
volves the question of differential criterion 
variance for the different subgroups. The 
moderator variable approach (differential va- 
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lidity) has been used with some success in 
dealing with minority group testing problems. 
However, differential validity is just one as- 
pect of differences between groups. It may 
well be that different subgroups also show 
differential variance on the criterion. If the 
standard error of estimate is used, this 
additional factor is taken into account. 

In any personnel decision problem, the con- 
cept of utility plays an important if somewhat 
unrecognized part. Depending on what one is 
willing to assume about the underlying utility 
of criterion scores, different strategies become 


more appropriate. Given a dichotomous utility | 


scale, it is sometimes better to hire those with 
lower predicted criterion scores (as long as 
they are above the predictor cutting point) 
if these scores have a higher probability of 
being above the cutoff point on the criterion. 
If the utility scale is assumed to be continu- 
ous, however, this strategy may not be best. 
The important point here is that one should 
at least be aware of the kinds of utility ques- 
tions that are involved in any testing program. 

The method outlined here for use in minor- 
ity group testing situations has the advantage 
of making use of all the relevant information 
concerning differences between the various 
groups. This method is nondiscriminatory 
since there is no over- or underprediction of 
criterion scores. In addition, the probability 
of success for persons selected from the dif- 
ferent groups is necessarily the same, since the 
standard errors of estimate have been taken 
into account in making selection decisions. 
The only case in which this method needs to 
be supplemented by the judgment of the 
decision maker is the case in which there is 
a continuous utility scale underlying the 
criterion scores and the two groups differ in 
their standard errors of estimate. In this case, 
one must make a judgment concerning the 
relative weights one is willing to place on 
probability of being above a certain score 
versus the magnitude of that score, 

Finally, the general 
different subgroups 
comes especially im 
test discrimination, 
should further diy: 
into other subgro 
income) and then 


question of how many 
One should use be- 
portant in dealing with 
One might argue that one 
ide each racial subgroup 
Ups (e.g, high and low 
analyze the data for four 
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rather than two subgroups. If there are dif- 
ferences between these four subgroups, 
wouldn’t one be discriminating against low 
income people, for example, by not using such 
information? The answer to this question is 
“yes.” However, the problem becomes one 
of trying to find differences that will be stable 
over time (cross-validation is essential for any 
subgrouping technique), and more impor- 
tantly, attempting to balance the relative costs 
and benefits that accrue from such a proce- 
dure. When data are ample enough, such sub- 
grouping may be highly advantageous (e.g., 
Sonquist & Morgan, 1964) in describing un- 
explained variance. However, there comes a 
point of diminishing returns where further 
dividing of the sample results in no appreciable 
: increase in explained variance and/or samples 
that are too small to treat as separate groups. 
just where one stops in subgroup analyses 
becomes & question of judgment, since the 
costs and benefit to be derived from such a 
procedure must ultimately be made by the 
decision maker himself. However, in principle, 
it is appropriate to continue to use separate 
subgroup analysis for personnel selection pur- 
poses as long as the benefits obtained from 
such use of separate regression procedures 
outweigh the costs involved in establishing 
them. 

Finally, it should be recognized that the 
procedures discussed here are applicable to the 
general problem of moderator variables when 
subgroups are used as the moderator variable. 
We have chosen to concentrate on the use of 
race as à moderator variable, but it should 


be clear that the issues discussed here apply 
to any situation in which subgroups are used 
as moderators of the relationship between 


two other variables. 
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ON CALCULATING UNBIASED INFORMATION MEASURES 
i A. W. MACRAE! 
Unicersity College of North Wales 


ss g vays of estimating information free from sampling bias were tested 

Sis Rex nm Monte Carlo uibus with a digital computer. Two techniques 

Pad little db commend them, but (a) two published techniques were efective with 

oe distributions; (b) calculating bias by Carlton’s formula, with the sample 
iiben as a model of the population probabilities, gave a good correction if the 
sample was large enough; and (c) estimating the population mean and standard 
deviation from the sample and using these to calculate population probabilities was 
effective even with small samples and with a wide variety of distributions, although 
it was slightly more restrictive in its assumptions than was Technique b. A table is 
provided for hand calculation of Technique b, and computer procedures for Tech- 
niques b and c have been made available. 


Information measures are defined asaverages 
of logarithms of probabilities, but empirical 
samples provide proportions, which are only 
estimates of these probabilities. Although the 
average of the proportions is the best estimate 
of the average of the probabilities, the mean 
log proportion is lower than the mean log 
probability, so the empirically derived measure 
tends to underestimate the population 
information, 

In many cases a biased statistic may be used 
to provide an unbiased estimate of the popu- 
lation measure, for example sample variance 
c? gives an unbiased estimate of population 
variance $? by the relation. 


? = a? X N/N — 1), 


where .V is the size of the sample. In the case 

: of information measures several attempts have 
been made at evaluating the bias (Carlton, 
1969; Cronholm, 1963; Miller, 1955; Rogers 
& Green, 1955), but no method now available 
is directly applicable to the situation where 
samples are drawn from an unknown popu- 
lation because the size of the bias is strongly 
dependent on the actual distribution of proba- 
bilities in the population as well as on the size 
of the sample. Carlton (1969) has shown that 
bias is estimated by 


£) log: e pigi(n — 1) 
np] 2 (ipi + qf 


! Now at the University of Birmingham. Requests 
for reprints should be sent to Alex W. MacRae, De- 
partment of Psychology, The University of Birming- 
ham, P. O. Box 363, Birmingham 15, United Kingdom. 
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where qi = 1 — pi, K is the number of cate 
gories, and 7 is the number of observations. 
If we know the set of probabilities in a 
distribution we can use Carlton's technique to 
calculate the bias to be expected in information 
measures based on any size of sample, but this 
is of little practical use because we do not 
usually know the a priori probabilities—and if 
we did, we could use them to calculate un- 
biased information measures directly. Instead, 
our estimates of bias must be derived from 
these same random samples, which 
sent the distribution only imperfe 
means that a prime requirement of a general 
bias correction. method is"that it should be 
largely independent of the Shape of the popu- 
lation distribution, since the latter will usually 
be unknown, If the general nature of the distri- 
bution is known, it may be possible to use this 
knowledge to evaluate bias more precisely, but 
it is Important to know the permissible range 
of discrepancy between the theoretical dis- 
tribution and those to which the technique may 
be applied, J 
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can repre- 
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arious probability 
ulated 


onan I, C, L. 4130 
On measures were, 
rent ways from these 
e increases, all informa- 
asymptotically to their 
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Program Description 


Using the technique described by Moshman 


(1967), pseudo-random numbers AR»; were 
generated in a uniform distribution over the 
g 


range 0-1. Appropriate transformations then 
converted each Rz to a new random number 
Sn falling in an exponential, uniform, normal, 
or sine-wave distribution. In all cases the 
distributions were limited to the range 0-1, any 
values falling outside these limits being dis- 
carded. The sine-wave distributions were added 
to Moshman’s program not because of any 
intrinsic theoretical interest but to increase the 
variety of distributions studied and especially 
to give a bimodal distribution with simple 
parameters. One of the sine waves had only one 
half-cycle, so that its envelope was 
y =A sin(zx/W), where 0 £ x cael 
the 


positive 
given by: 
and y X 0.2 By varying W the width of 
distribution could be changed within the 
interval 0-1. When W > 1 various asym- 
metrical curves could be produced. The other 
sine wave had the form 


y = A[1 + sine (4 — 0.5) ]. 


This filled the interval 0-1 with two complete 
.vcles of a sine wave, and so produced a 
cycle sine A 

bimodal distribution. 

Figure 1 shows a selection of the distribu- 
tions actually obtained by random sampling 
from these theoretical populations. 

The continuum{within the interval 0-1 could 
be divided into any number (K) of equal 
sections. For a sample of V items, .V successive 
Sus were generated, and the number falling in 
each of the K sections was counted. This was 

a 
repeated for a number (S) of samples of the 
* i a half sine wave of formula 
2 For unit area under a sine wa ] 

y 4 sin (zx/W), A must be z/2W. The appropriate 
=n & 1 , = ACRES 
transformation is then given by Rn = T/W Jo sin 
(ax /W)dx. Solving for Sn gives Sn = (W/7) cos 
— 2Rn). ee. 
m eed to John Cunningham of the University 
^ A e of North Wales (UCNW), Department of 
ae ed Mathematics, for his assistance with this proof, 
: n to Barry Sanderson of UC YW Computing Labora- 
pe who translated Moshman’s Fortran program into 
p. and Neal, adapting it to the hardware require- 
em cat the UC JW installation. j ae 
ey direct transformation gives the bimodal distri- 
e ting two successive uniformly dis- 


i but genera à $ 
Rael random numbers designated Sn and Rn d 
tri ve the pair if 2ku > 1 + sinz (45n + 0.5) gives 
rejecting z 


the desired distribution of Sn. 
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Sine Wave Exponential i 
W:2 Lambda:3 "inet 
Normal Sine Waye Rectangular 


Fic. 1. Six of the randomly generated 
distributions used in the study. 


same size; S was always an integral power 
of two. 

When all information calculations had been 
completed with these samples, a new set of 
S/2 samples of size 2N was formed by com- 
bining adjacent pairs of the original samples, 
and new information measures were calcu- 
lated. This sequence was repeated until there 
was only one sample, of size N X S. In the 
majority of the cases cited, S was 128 and V 
was 20. 


Bias Elimination 


The methods of correcting for bias were: 

Raw bias (RB). The sample distribution was 
taken as a perfect representation of the parent 
distribution, and bias was calculated by 
Carlton's formula for a sample of that size 
from such a population. In practice this means 
that for every probability encountered, its 
bias contribution was evaluated along with its 
surprisal (— p log p). 

Miller bias (M B). The obtained information 
measure was taken as being derived from a set 
of equiprobable alternatives. Thus if H is the 
information in bits, the number of alternatives 
is 2", Miller's (1955) formula, Bias = df/ 
1.3863 was used to calculate the bias, taking 
df as the indicated number of categories. 

Normal bias (N B). The obtained distribution 
was assumed to be drawn from a normally 
distributed parent population curtailed by the 
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limits of the category range in use. The popu: 
lation mean and standard deviation were 
estimated from the sample using Sheppard E 
correction for grouping, and the bias expecte l 
from such a population was calculated by 
n's formula. . 
qmm information (rH. Instead of esti- 
mating the bias as in NB and using it to 
correct the empirical information, the informa- 
tion content of the population was directly 
calculated. This is much simpler. : . 
Carnor estimate (CN). The information 
content of a normally distributed population 
was first calculated for each of a number of 
standard deviations. At each of these the 
expected bias was found by Carlton's formula 
for each of the relevant sizes of N. Plotting 
population information against sample (biased) 
jnformation permits prediction of the former 
from the latter and was used in this way by 
MacRae (1970). In calculating CN the biased 
and unbiased estimates were calculated in 
advance by the program Carnor,‘ and the 
present program had only to interpolate be- 
tween the calculated values. 
Carrec estimate (CR). This used the same 
calculation, but assumed a rectangular distri- 
bution and used data from the program 
Carrec. 


Subsequent Calculations 


Some of these techniques give direct esti- 
mates of unbiased information and some 
estimated the bias, Where bias was estimated 
it was used to correct the information (H) 
which had been calculated directly from the 
sample. These calculations were performed for 
several samples of each size so that the mean 
and standard deviation could be found for 
information estimated by each technique. 

Samples of different sizes were all drawn 
from the same parent population because each 


‘The programs Carnor and Carrec in I. C. L. 4100 
Algol and sample graphs relating biased to unbiased 
information have been deposited with the National 
Auxiliary Publications Service. Order document No. 
00681 from National Auxiliary Publications Service 
f the American Society for Information Science, 
be CCM Information Sciences, Inc., 909 West 3rd 
c/o New York, New York 10032. Remit in advance 
Pah hotocopies or $1.50 for microfilm and make 
pm sila to: Research and Microfilm Publica- 
checks pay? 


tions, Inc. 
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larger sized sample was obtained by collapsing 
together two of the smaller ones. The parent 
population (ie., the largest sample, of size 
N X S) would not as a rule reproduce exactly 
the theoretical distribution from which it was 
drawn, hence (a) the estimates were not being 
tested for efficiency in the case of theoretically 
exact distributions only, and (b) the vari- 
ability between different sample sizes was 
minimized so that it was possible to obtain 
fairly smooth curves relating information 
estimates to sample size. 


EMPIRICAL RESULTS 


Table 1 compares the accuracy of the various 
techniques as influenced by distribution Shape 
and sample size. Table 2 tabulates the practical 
advantages of each method. 

It can readily be seen that each method has 
some advantages. All of the correction methods 
are capable of giving a better estimate than 
does H, though the improvement with some of 
them applies only when used with a restricted 
class of distribution. 

RB does not correct well when N is small 
because, as Carlton has said, it is itself subject 
to a bias analogous to that in H. It is reason- 
ably efficient with V = 4K and excellent when 
N X 8K. It makes no assumptions about the 
probability distribution and requires no more 
than nominal scaling of the data. Itis laborious 
to calculate because a formidable function 
must be evaluated for each probability en- 
countered, but if a computer is used to calcu- 
late H itis easy to add the RB correction to the 
program. If H is being calculated by hand from 
tables of — p log p it is no extra labor to have 
them corrected for RB, though a different 
column must be used for every XN. Table 4 is 
an abbreviated form of such a table, covering 
the most useful range of Ns, 

MB can be calculated without special t 
waleulatine m Soe nately equivalent to 
calculating H with an N of double the si 
but with V small it we & Size), 


[ as the les ti 
e ded € least effective 
NB gave good corr 


ables, 


ection with 


than K, and excell. N no larger 
Ns. It d nd excellent Correction with fitale 
$ as Very effective with m i 


form distributi 
Stributions and was Satisfactory With 
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TABLE 1 
ACCURACY OF. CORRECTION TECHNIQUES wrrit Various DISTRIBUTIONS 
Technique 
Distribution 
H RB MB NB NH CN cm. eee Ton 
Exponential 18 * see + * 
A=3 4.9 * * eek 3k ek * 3k 
1 5 ee ** bid d eee *** ** a 
Exponential 14 a a 
A — 10 44 sa * ** * * * 
1 2 RE dk Lil doe koe ook 
Normal 18 ** * m 
SD. 49 bid * koe ak +e see 
1 1 A» Ld Lid * Lid Lid kkk 
Normal 10 ** 2 ** 
SD = 05 2.5 doe + d e ook * eek 
0.6 ee *** ^ Lid Lind Gode Lis d 
Normal 19 3 de xe 
SD = .05 (80 cells) 6.0 bs e sre »- * 
12 eee ** ee *** Lind *x Lid 
Sine 17 ** dole * 
Wai 43 ke * ++ * ok ** 
1 2 a ee oen bild ee eK 
Sine 18 - r ae 
W=2 4.9 Lnd *»* ee * oko +e 
12 +*+ *+* Lond * ** kkk blaka 
18 * dece 
Bindal 45 ** * ** »* ** ake 
1 3 ee ** +e ee doe kkk 
19 * ** 
Rectangular 4.3 bid fer bid sid * 
1.0 hee * ee *» dee d» 


and 320. The corresponding entries 


Note.— For each distribution the H column shows the percentage error in H with N = 20, 80, 
qual or greater Ns on the following 


in the other columns 
= 2% error; ** = 1% error; 


scale: * , Ten a A ir 
pon one with 80 cells. It was included for comparison because its information content is si 


ow the maximum percentage error of each information estimate with e 
Tek = .5% error. Each distribution was divided into 20 


sections, with the exception of the 
milar to that given by 20 cells with 


E: No techinique gave 2% accuracy, but NB and CN were best, both being accurate to 3%. 


uniform ones. This makes it the most widely 
applicable technique though its complexity 
makes electronic computation essential. . 

NH gave very consistent results—its esti- 
mate was as good with V = K as with N very 
large. Unfortunately this meant that it did not 
improve as V increased, so that if the estimate 
was poor with V small it remained poor, and 
was often worse than H itself when N was very 
large. It can be hand-calculated (using tables 
of normal probability), but except for very 


small Vs or where the distribution is known to 
be nearly normal, it does not justify the labor. 
CN is easy to use if graphs or tables from 
Carnor are available. It gave good results*with 
all markedly nonuniform distributions and can 
be used with nominally scaled data or bimodal 
distributions because only the set of proba- 
bilities (and not their arrangement) is assumed, 
If fewer than 20% of the nonzero probabilities 
are smaller than a fifth of the size of the 
greatest one, then CR is likely to be better, 
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TABLE 2 


PRACTICAL ADVANTAGES OF CORRECTION TECHNIQUES 


Technique 
Advantage - 
H | RB|MB| NB|NH|CN | CR | £N m 
i * * * * Zz " 
Hand calculation practicable - 
A * * * P " 7 
Applicable to nonmetric data 
Applicable to known rectangular distributions « |» * $ A 
Applicable to known normal distributions a ie Pie he 
Applicable to known exponential (and skewed normal) ES * P 
distributions 
Applicable to known bimodal distributions |s Pr "ades 
Applicable to unknown distributions & dux s 
Gives reliably good estimate (where applicable) with V = K ela lul 
Gives reliably good estimate (where applicable) with V = 4K * e m wx z 
Conservative (never overestimates //) giles 
% 


CR worked well with near uniform (rec- 
tangular and sine) distributions and less well 
with markedly nonuniform ones. In other 
respects it resembled CN. 

As a rule, those methods which gave the 
lowest information estimate also had the lowest 
variability. This is not an advantage but only 
a reflection of the numerically smaller values 
from which both the mean and variance were 
calculated. Of the efficient techniques, NB was 
somewhat less variable than CN and CR. A 
more important finding was that variability in 
H (and even more markedly in the corrected 
estimates) decreased at a rate greater than that 
to be expected from the increasing size of the 
sample. For example, with 2,560 numbers 
sampled from a uniform distribution and 
divided into 20 categories of size, the variances 
of the sample estimates with the four most 
appropriate techniques varies with the sample 
size in,the way shown in Table 3. Dividing in 
each case by S — 1 assesses the variance of an 
estimate based on the whole sample of 2,560. 

With all the observations in any one dis- 
tribution study equal to N X 5, the variance 

of an estimate based on the entire distribution 


) 
was greatest when it was derived from the 
average of S measures each based on N ob- 
servations, It was less than half as great when 
derived from S/4 measures each based on 4.V 
observations. This means that increasing the 
Ns on which information estimates are based 
is the most efficient way of reducing random 
error, so it is better to double the number of 
observations on each subject than to double 
the number of subjects. 

Results have not been given for cases where 
N < K (where K is the effective number of 
nonzero categories, rather than the total 
number of theoretically available categories). 
It is doubtful if any technique can be relied on 
to give satisfactory corrections with very small 
Ns. In addition to sampling bias MacRae 
(1970) has described a number of other effects 
which distort information measures, and all 
increase as V becomes smaller. 

If bias corrections are being used to improve 
transmission estimates, the minimum number 
of observations is determined by the number 
of categories used in response to each individual 
stimulus. Transmission (T) may be found from 
T = H(R) — HS(R), where H(R) is response 


TABLE 3 
VARIANCES OF INFORMATION MEASURES DERIVED 
2,560 RANpow NUMBERS UNIFORMLY DISTRIBUTED 
OVER 20 CATEGORIES AND DIVIDED INTO 
SAMPLES OF VARIOUS SIZES. 


Sample arrangement 
Kronman N= a0 N =80 N = 320 
S=128 S=32 S= 8 
H 2669 326 32 
21 1t 5 
RB 4586 360 32 
36 12 5 
NB 2867 326 32 
23 11 5 
CR 12085 502 35 
96 16 5 


the empirically ob- 

derived from S 
n each cell is the ob- 
mate the variance of 
s have been multiplied 


Note—The upper entry i 
served variance in infor 
samples, i; lower en 
served variance divided by S — 1 to esi 
the mean of all the S sa A i 
by 105. 


information and HS(R) is response equivoca- 
tion. The latter is the average information in 
responses to individual stimuli. 

The NV to be used in calculating bias in 
HS(R) is the number of responses to each 
stimulus. It must be at least equal to the 
number of response categories which may 
actually be used. Where transmission is good 
this number may be quite small, but with low 
transmission it will often equal the number of 
available response categories. In either case the 
N associated with H(R) is much larger, being 
the sum of all the Vs in ZS(R). For this reason 
the bias in Z(R) is always much smaller than 
that in H.S(R) and may usually be considered 
negligible. 

The foregoing should be considered to be 
minimum values for W. It is desirable that N 
should be as large as is experimentally con- 
venient because increasing .V is the most 
efficient way of reducing sampling variability. 


PRACTICAL RECOMMENDATIONS 
PRACT 


It should be remembered that the nominally 
“normal” distributions were in fact confined 
to the range 0-1 so that with a standard 
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deviation (SD) of 0.2 the tails were cut off at 
2.5 SDs. This reduced the number of very 
small ps and so made the distribution more 
uniform. This accounts for the relatively poor 
performance of NH and CN (and the relatively 
good performance of CR) on the 20 cell, SD 0.2 
distribution as compared with the 80 cell, 
SD 0.05 distribution which is superficially 
equivalent. 

It is precisely its ability to take account of 
curtailment which makes VB so effective. An 
exponential distribution is quite well repre- 
sented by a section of normal curve with its 
mean near one end of the permitted range, and 
a uniform distribution is adequately shown by 
a section from a normal curve of large SD and 
with a central mean. Since it adequately 
corrects rectangular, normal, and exponential 
distributions it should also suit intermediate 
skewed ones. Hence, although the correction 
involves the calculation of normal curve 
parameters and so should theoretically be 
confined to use with data measured on at least 
an interval scale, it is also appropriate with 
most ordinal scales, the only provisions being 
that the resulting distribution should be 
unimodal and not too narrow. With a very 
narrow distribution (up to two cells wide) a 
valid assessment of the standard deviation is 
not possible, even using Sheppard’s correction. 

NB is misled by a markedly bimodal distri- 
bution because the calculated standard devi- 
ation is large and the mean central, so that the 
estimated bias approximates to that for a 
uniform distribution, whereas the relative 
numbers of each size of probability (and hence 
the appropriate correction) are similar to that 
for a normal distribution of the same H. In 
such a case CN could be used, even with small 
Ns, but markedly bimodal distributions are 
rather rare in the sorts of situations where the 
technique might be applied; so unless theo- 
retical considerations or examination of the 
data suggest bimodality, NB seems the ideal 
technique for most applications. The ad- 
vantage of being virtually independent of the 
shape of the distribution sampled outweighs 
its minor disadvantages. 

Because its calculation is very complex, it is 
only practicable to use NB if a computer is 
available. It is not practicable to compile 
tables for future use because VB has four 
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“A TABLE For THE DIRECT CALCULATION OF INFORMATION CORRECTED BY THE RAWBIAS TECHNIQUE 


Sample size (N) 
Frequency |—— —— - 2 TD IB Lr "m — — 
‘Somes 16 nm | on | w 36 40 | so | e 80 100 + 
1 2991 2557 2242 1903 | .1660 | .1532 1291 | .1120 | .0892 | .0746 
2 4225 3711 3317 | .2872 | .2542 | .2365 | .2022 1773 | .1434 | .1212 
3 4964 | .4468 4060 | .3577 | .3203 | .2999 | .2504 | .2205 | .1878 | .1599 
4 5395 | .4980 | .4599 | .4117 | .3728 | .3509 | .3068 | .2734 | .2259 | .1937 
5 5600 5310 | .4986 | .4537 | .4152 | .3929 | .3469 | .3112 | .2506 | ‘2939 
" 5626 5496 | .5254 | .4860 | .4496 | .4277 | .3813 | .3443 | .2806 | 2511 
7 5502 5563 5422 | .5104 | .4773 | .4566 | .4109 | .3734 | .3166 | .2760 
8 5250 5527 | .5505 | .5279 | .4994 | .4803 | .4363 | .3990 | .3411 | .2988 
9 4886 5402 | .5512 | .5395 | .5164 | .4995 | .4582 | .4217 | .3633 | .3198 
10 4423 5197 | .5453 | .5457 5290 | .5147 | .4769 | .4417 | .3835 | .3392 
11 3869 | .4920 | .5335 | .5472 5377 | .5263 | .4927 | .4593 | .4020 | .3572 
12 3234 | .4577 | .5162 | .5443 5427 | .5346 | .5059 | .4747 | .4188 | .3739 
13 2525 4175 | .4939 | .5374 5443 | .5400 | .5167 | .4881 4340 | .3803 
14 1746 | .3718 | .4670 | .5268 | .5429 | .5426 | .5253 | .4997 | .4480 | .4037 
15 0903 | .3209 | .4358 | .5128 | .5387 | .5426 | .5318 | .5096 | .4606 | .4170 
16 2652 | .4006 | 49056 | .5317 | .5402 | .5364 | .5178 | .4720 | .4294 
17 2050 | .3617 | .4753 | .5223 | .5356 | .5392 | .5246 | .4823 | 4409 
18 1406 | .3192 | .4523 | .5105 | .5288 | .5408 | .5209 | 4915 | 4515 
19 0722 | .2734 | .4266 | .4965 | .5201 5398 | .5340 | .4998 | (4613 
20 2245 | .3984 | .4804 | .5094 | .5378 | .5367 | .5071 | .4704 
21 1725 | .3677 | .4623 | .4970 | .5344 | 5383 | .5135 | 788 
22 1177 | .3348 | .4423 | .4828 | .5296 | .5387 | .5190 | .4864 
23 0601 2997 | .4205 | .4670 | .5234 | .5380 | .5237 | .4934 
24 2625 | .3969 | .4497 | .5161 5363 | .5276 | .4998 
25 2234 | .3717 | .4308 | .5075 | .5335 | .5308 | .5056 
26 1823 | .3448 | .4105 | .4978 | .5299 | .5333 | .5108 
7 1393 | .3165 | .3888 | .4869 | .5252 | 15351 5155 
28 0946 | .2866 | .3658 | .4750 | .5197 | 15362 | .5196 
20 0481 2553 | .3415 | .4621 5134 | .5366 | .5232 
30 2226 | .3159 | 4481 5062 | .5365 | .5263 
à = | 4886 | .2892 | .4332 | 4982 | .5357 | 5280 
3 1333 | .2613 | .4174 | .4805 | .5343 | ‘5311 
A «1168 | .2322 | .4007 | .4799 | .5324 | ‘5398 
E 0790  .2021 | .3831 | .4697 | .5300 | “5341 
es .0401 | .1709 | .3647 | 4588 | .5270 | 5349 
= A eme -1387 | 3454 | 4471 5235 | .535. 
"€ | 1054 | .3253 | 4348 | i5195 | 5354 
x 0712. | .3045 | .4219 | 15150 | .s3s0 
- 0361 | .2828 | .4083 | .5100 | .3343 
4 2605 | .3941 | .5046 | 15332 
—— P d^ 2374 | 3793 | 4987 | 5317 
4b .2137 3639 4924 5299 
y | | 1892 | .3479 | .4857 | “5079 
E J641 | 3314 | 785 | “5953 
45 m a 7 ES y SS | AN 
| es | —— 
——— | «1118 | .29 == 
46 | | | 968 | .4630 | 5193 
47 
48 | 
49 
50 l ERE F pu me 
2 | To use the table enter nor le m 
Note.” not practicable. For. other samp'entry contain : 
column te Distribution. Because each table inih ie: HERR UIFER nent the tota] Tepresents an, 


rris not asa » permit correction ol rA ind H(R) components in mm " 
NI decays much smaller than that in HS(R), and PR in its effect on 7, Sing "gnsmission estimate. 
x tans estimate is given by correcting only HS(R). * RB tends to undere 
ble mission estima 
HR) ble transmi ni 


+ but th 
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OfTect, an 


j 


UNBIASED INFORMATION MEASURES 


parameters (K, N, M and SD) so that éither 
the tables would have to be inordinately bulky 
or laborious interpolations would be required 
when they were used. Instead Algol procedures 
for direct calculation of Z/ and NB from raw 
data in frequency tables have been made 
available.* 

One of the attractions of information 
measures has always been their applicability 
to nominally scaled, nonmetric data. NB can- 
not be used in such applications because it 
involves the calculation. of normal curve 
parameters (and hence inadmissable arith- 
metic operations). RB, MB, CN, and CR may 
all be used, but only three of these are recom- 
mended. Where the distribution can be classed 
as clearly nonuniform CN is best, with any 
size of N; if it is nearly uniform CR gives the 
best estimate. 

CN invariably gives a higher information 
estimate than does CR. For nonuniform distri- 
butions with a smaller proportion of low 
probabilities than the normal curve neither is 
completely accurate, CNV giving too high and 
CR too low an estimate, and with the ex- 
ponential À = 10 distribution (which has an 
even higher proportion of low probabilities 
than the normal curve) even CN under- 
estimated the bias. If insufficient. evidence 
exists to classify a distribution either as nearly 
uniform or as markedly nonuniform, a safer 
estimate (minimizing the maximum error) is 
given by the mean of CN and CR. 'This mean 
is a poor estimate only where either CV or CR 
is a good one. 


5 Procedures Info, Rawbias and Norbias in I. C. L. 
4100 Algol, and an extended version of Table 4, have 
been deposited with the National Auxiliary Publica- 
tions Service. Order document No. 01256 from National 
Auxiliary Publications Service of the American Society 
Íor Information Science, c/o CMM Information 
Sciences, Inc., 909 West 3rd Avenue, New York, New 
York 10032. Remit in advance $5.00 for photocopies 
or $2.00 for microfilm and make checks pavable to: 
Research and Microfilm Publications, Inc 
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The foregoing recommendations apply to 
cases where JV is small. If N X 4K then RB 
gives a good estimate requiring no assumptions 
at all about the distribution. It is usually 
better than the mean of CN and CR and be- 
comes as good as the excellent VB when 
N = 8K. Its freedom from restrictive assump- 
tions and its applicability to nonmetric data, 
along with its relatively simple calculation 
using Table 4, make this an attractive tech- 
nique. But achieving an V equal to 4K may 
be experimentally laborious. If 100 stimuli are 
presented, and each may give rise to any of 10 
responses (with different probabilities), a total 
of 4,000 presentations must be made to each 
subject whose transmission is to be measured. 
Experimental designs on this scale have of 
course been used, but they have been the 
exception rather than the rule. 

In some cases it may be desirable to keep the 
number of judgments by each subject to a 
minimum, and then one of the more powerful, 
though restrictive, techniques may have to be 
used. 
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In a recent paper, Spevack and Suboski 
the nature of the retroactive effects of e 


(1969) reviewed the evidence bearing upon 
lectroconvulsive shock (ECS) upon learned 


responses, and they proposed an interpretation for the wide variety of retrograde 


amnesia gradients ranging from seconds 
suggested that the “true” retrograde am 


to hours, reported in the literature. They 
nesia gradient occupies only several seconds 


following learning, and that longer retrograde amnesia gradients reflect, not an ECS 


effect upon memory, but the halting by 
time of an experimentally produced co 


ECS of the incubation, that is, change over 
nditioned emotional response (CER). The 


present paper points out that the evidence for Spevack and Suboski's interpretation 


is weak. An explanation of the wide v; 
terms of treatment effectiveness, is put 


During the past several years, a great many 
studies have shown that electroconvulsive 
shock (ECS), administered shortly following a 
learning experience, exerts a disruptive effect 
on subsequent performance (Glickman, 1961; 
McGaugh, 1966). This effect is generally 
interpreted as support for the consolidation 
hypothesis of memory storage (e.g., Gerard, 
1949, 1955; Hebb, 1949). Essentially, this 
hypothesis assumes that following learning the 
neurobiological bases of memory are first in a 
labile state and then gradually become fixed 
(consolidated) into a more permanent form. 
The ECS-produced retrograde amnesia gradi- 
ent is assumed to assess the time course of 
consolidation. 

There have been, however, a number of 
alternative interpretations of the basis of the 
retrograde disruption of performance produced 
by ECS. These arguments have attempted to 
explain away the apparent retrograde amnesia 
as due to other effects of the ECS treatment 
that interfere with the animal’s performance 


1 The preparation of this review was supported by 
predoctoral traineeship MH 11095. and predoctoral 
fellowship MH 44722 from the National Institute of 
Mental Health, United States Public Health Service to 
the author, and by research grant MH 12526 from the 
National Institute of Mental Health, United States 
Public Health Service to Dr. J. L. McGaugh. 

2 The author gratefully acknowledges the assistance of 
Dr. J. L- McGaugh, whose suggestions and encourage- 
ment were invaluable in the preparation of this review. 
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ariety of retrograde amnesia gradients, in 
forward as a more reasonable alternative. 


(e.g., Coons & Miller, 1960; Lewis & Maher 
1965). These alternative suggestions have not 
led to any drastic revision of the conventional’ 
notions because of their lack of generality (cf 
McGaugh & Petrinovich, 1966) and their 
failure to provide an adequate e: à pe 
the retrograde amnesia gradient. 
Recently, Spevack and Suboski (1969) have 
suggested still another alternative HE, He 
of the retroactive effects of ECS. Their h i 
pothesis has an apparent advantage over E 
viously offered alternatives in that it Hed de 
to explain the retrograde amnesia evading, : 


tion of 


The CER Incubation H pothesis 


Spevack and Suboski 
a ski argued that the ECS- 
: he ECS- 
~~ "E rao amnesia gradient is E d 
rally short (i.e., less than 30 seconds t 
perra, Ge r con s) and that 
n €, passive) avoidance situ: 
tiong, the behavioral effects seen when ECS 
is given at longer intervals following traini 
have been misinterpreted as ECS effe Pee 
memory. "SEE 
They suggested that i inhi 
) f at in the inhibitor, i 
ance situations typically used to sis pud 
grade amnesia, what is produced ; (d nr 
quence of punishment is nota s mu soc 
ance contingency, but rather PE -— 
E UR ps apparatus cues (cf Chor ^c 
» " Th: "I B t ro X 
i - 66). This CER increases in st E 
(incu bates) following punishment, ^ "OM 
incubation is assumed to undi e > € 
L ad à erly the i 
in avoidance response Strength. : Ee a 
Cooper, 19662, 19665) and the ty (C Pinel & 
" he long retrograde 
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amnesia gradient. Furthermore, ECS is as- 
sumed to halt CER incubation such that the 
behavioral manifestations, that is, crouching, 
Íreezing, urination, and defecation are at the 
same level on a retest as they were when ECS 
was given. As the training-ECS interval is in- 
creased, avoidance is presumed to increase, 
not because more consolidation has taken 
place, but because the CER has increased in 
Strength before being halted by ECS. The 
authors further suggested that since CER 
incubation underlies the retrograde amnesia 
gradient, "any experimental manipulation 
which affects the ECS gradient will have a 
similar effect upon the incubation gradient and 
vice versa [Spevack & Suboski, 1969, p. 71." 

The present analysis points out several flaws 
in the Spevack and Suboski thesis. The authors 
suggested that there is a dichotomy between 
short “true” retrograde amnesia gradients and 
longer retrograde amnesia gradients, yet they 
fail to consider the wealth of evidence (e.g., 
Alpern & McGaugh, 1968; Cherkin, 1969; 
Dorfman &  Jarvik, 1968; Miller, 1968; 
Weisman, 1963), which suggests that retro- 

rade amnesia curves should not be interpreted 
consolidation curves. Retrograde amnesia 
curves are susceptibility curves, and as such 
depend both upon the processes which are sus- 
ceptible to modification, as well as the treat- 
ments used to assess susceptibility. Thus, for 
any particular task, there is not one retrograde 
amnesia curve; rather, many may be generated 
simply by changing treatment parameters 
(Cherkin, 1968) or response criteria (Schneider, 
Kopp, Aron, & Jarvik, 1969). Such considera- 
tions suggest strongly that the basic dichotomy 
of short and long gradients of retrograde 
amnesia proposed by Spevack and Suboski is 
untenable. 

Concerning the CER incubation hypothesis 
specifically, this review points out that (a) 
there is no evidence that the retrograde 
amnesia gradient is necessarily short when a 
CER is not produced; (5) Spevack and Suboski 
did. not put forward any convincing evidence 
that ECS halts incubation of a CER; (c) they 
failed to make an experimental distinction 
between CER and contingent avoidance con- 
tributions to the avoidance response. As a re- 
sult they encounter difficulty in (a) establishing 
that CERs are the major factor mediating 


avoidance in the inhibitory avoidance situa- 
tion; (b) specifying the nature of the CER 
incubation process (in order to compare it to 
the ECS gradient); and (c) distinguishing 
between possible effects of ECS upon consolida- 
tion from effects upon CER incubation in their 
own or others' experiments. 


The Short Retrograde Amnesia Gradient 


The first major assumption of Spevack and 
Suboski’s thesis was that the true retrograde 
amnesia gradient is short. Their evidence came 
from two sources: a one-trial inhibitory avoid- 
ance paradigm (Chorover & Schiller, 1965, 
1966) and one-trial discriminated avoidance 
situations (e.g., Pfingst & King, 1969; Suboski, 
Spevack, Litner, & Beaumaster, 1969). 

The results of Chorover and Schiller (1965) 
suggest that the amnesic effects of ECS in a 
one-trial step-down task are observed only if 
ECS is administered within 10 seconds of the 
trial. They stated that longer gradients ob- 
served in other studies are the result of a con- 
founding influence due to the development of 
a CER. However, a closer inspection of 
Chorover and Schiller’s procedure and data 
suggests an alternative explanation. Their pro- 
cedure included the use of a 30-second maxi- 
mum cutoff latency; that is, during the retest 
all subjects avoiding for 30 seconds achieved 
the maximum latency score possible (L8. 
"perfect" retention). When the duration of 
footshock administered to subjects upon the 
initial step-down was long (2 seconds or 4 sec- 
onds) the great majority of subjects met the 
30-second criterion, and there was no apparent 
amnesia. When footshock duration was short 
(0.5 seconds or 1 second), the majority of sub- 
jects avoided for less than 30 seconds. These 
groups showed amnesia since thev exhibited 
shorter median step-down latencies than sub- 
jects given footshock alone (18.50 and 20.00 
Seconds versus 27.50 and 29.75 seconds). The 
absence of an ECS effect at 30 seconds seen 
with longer durations of footshock may, there- 
fore, reflect a ceiling effect rather than the 
absence of an amnesic effect of ECS. Schneider 
etal. (1969) have tested this prediction directly 
in both rats and mice in a one-trial inhibitory 
avoidance situation (cf. Kopp, Bohdanecky, & 
Jarvik, 1966). Following the trial, animals were 
given ECS after 10 seconds, 1 hour, or 6 hours. 
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They were subsequently retested and the cut- 
off latency for a response was set at 30 seconds 
(cf. Chorover & Schiller, 1965), 300 seconds, 
or 600 seconds (cf. Kopp et al, 1966). The 
choice of cutoff latency drastically alters the 
length of the ECS susceptibility gradients. 
With a 30-second cutoff the gradient extends 
to 10 seconds only, in both rats and mice, but 
with a 600-second cutoff it extends for 6 hours 
in rats and 1 hour in mice. In these situations, 
the ECS effect is not an all-or-none phenome- 
non. The suggestion is that in order to show 
that two groups are equivalent, the experiment 
must be designed so that the majority of sub- 
jects in the experimental groups are allowed to 
make the previously punished response. 

The results from one-trial discriminated 
avoidance paradigms have failed to show 
graded effects of ECS beyond 30 seconds (e.g., 
Pfingst & King, 1969; Suboski et al., 1969). 
However it seems inappropriate to conclude 
that retrograde amnesia gradients in one-trial 
discrimination tasks are necessarily short since, 
in this type of task, the consequences of manip- 
ulating ECS parameters (e.g, Alpern & 
McGaugh, 1968; Dorfman & Jarvik, 1968) 
have not as yet been investigated. 

To substantiate their case further, Spevack 
and Suboski also discussed the results from 
studies using appetitive learning tasks, in 
which the question of CER production did not 
arise. As a consequence, their hypothesis pre- 
dicted a short retrograde amnesia gradient, but 
this does not appear to be the case. In a study 
by Herz (1967), in which the duration of the 
effects of ECS in a one-trial appetitive situa- 
tion was investigated, almost total amnesia was 
evidenced on several retest measures of 
approach and consummatory behavior follow- 
ing an ECS administered 80 seconds after 
learning commenced. Herz also reported evi- 
dence for a partial amnesia on two of nine 
measures when the ECS was administered up 
to 30 minutes after the training. Tenen (19652, 
1965b) found an amnesic effect in a similar 
situation with ECS delays of up to 1 hour. 
These findings are completely at odds with the 
CER incubation hypothesis, but these and 
other gradients can be rationalized when 
treatment effectiveness is considered. Pinel 
(19692), in rats, demonstrated a 1-minute 


retrograde amnesia gradient using a 60-milli- 
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ampere current for 0.5 seconds, whereas 
Tenen (1965a, 1965b), in rats, demonstrated 
a l-hour retrograde amnesia gradient using 
a 150-milliampere current for 0.2 seconds. 
Herz’ (1967) studies revealed an intermediate 
gradient using an 18.5-milliampere current 
of 800-millisecond duration in mice (cf 
Dorfman & Jarvik, 1968). j 
Specification of the time course of consolida- 
tion would seem to require a closer examination 
of the nature of the effect of a variety of treat- 
ment strengths upon consolidation. Do all 
treatments halt consolidation, or do the 
weaker treatments merely slow it down? If the 
latter is the case, then determination of retro- 
grade amnesia gradients after 24 hours, as is 
typical, would lead to an underestimation of 
the duration of ECS susceptibility (Cherkin 
1969). An inspection of Figure 1 may clarify 
this point. This model is a modified version of 
the one recently proposed by Cherkin (1969) 
According to the model, amnesic agents TAY 
stop, or simply slow consolidation. The differ- 
ential amnesic effects of different treatments is 
assumed to be due to their effectiveness in 
slowing or stopping consolidation. The retro- 
grade amnesia gradient is graphed by plotting 
above each ECS treatment the point indicating 
the maximum strength of trace attained at die 
time of retest following the treatment The 
more effective the treatment the lower the 
ue asymptote would be at retest, and the 
— pr retrograde amnesia gradient ob- 
ed. Notice that one prediction of trace 
slowing is that recovery of memory shout 
occur with weak treatments (cf. Pagano 
Bush, Martin, & Hunt, 1969). A more detailed 
review of this and other models has recently 
been attempted (McGaugh & Dawson i 
press). Uv 


Effects of a Single ECS upon a CER 


Spevack and Suboski (1969) cited only one 
experiment in support of the notion dosh A 
single ECS may have a disruptive effect a kn 
a CER. In this study (Chorover & Schiller, 
1966) rats were allowed to explore a large GNE 
partment, and a record was kept of their loco- 
motor activity. Subjects were subsequently 
removed to a smaller box in which they re 
ceived 20 inescapable footshocks over a 1 
minute period. Half of the subjects received 
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Fic. 1. Long and short retrograde amnesia (RA) gradients depending upon 
the severity of ECS disruption of memory consolidation (Adapted from 


Cherkin, 1969). 


ECS and half received pseudo-ECS 1 minute 
following this procedure. Prior to treatment, 
there were no significant differences between 
these groups on the two measures of locomo- 
tion employed, that is, boundary crossing and 
number of squares entered in the large chamber 
(ECS group medians: 104.5 and 45.5; pseudo- 
ECS group medians: 101.0 and 40.1). On the 
retest, ECS groups exhibited significantly more 
locomotor activity (14.0 and 13.5) than 
pseudo-ECS groups (7.5 and 7.9). Yet, differ- 
ences in relative reduction of activity (i.e., 
the difference between pre- and posttraining 
median Scores) are small. The ECS subjects 
showed a reduction of 90.5 and 32.0. Pseudo- 
ECS groups showed a reduction of 93.5 and 
32.2. The utilization of individual difference 
scores (a more appropriate index of relative 
reduction of activity) may have failed to find 
significant differences between the groups (cf. 
Zornetzer & McGaugh, 1969), 

At best, such evidence would seem to be but 
a preliminary indication of CER disruption by 
a single ECS. The experiment clearly requires 
replication; it should not be considered as a 
cornerstone for a comprehensive hypothesis. 


Additionally, this experiment examines only 
one training ECS interval (1 minute). Obvi- 
ously, more complete analyses of the effects of 
delayed ECS are necessary before these results 
are extrapolated to encompass possible effects 
of ECS upon a CER at much longer intervals. 
Moreover, such analyses are necessary in order 
to assess the nature of the effect; for instance, 
does ECS arrest incubation as Spevack and 
Suboski proposed? If so, this must be rational- 
ized alongside the recent findings (Geller & 
Jarvik, 1968; McGaugh & Landfield, 1970)? 
that following a 20-second delayed ECS, after 
an inhibitory avoidance trial, the avoidance 
response is first strong, then declines over the 
next several hours, Since incubation pheno- 
mena were assumed by Spevack and Suboski 
to make an important contribution to the 
avoidance response only after longer intervals, 
the inibitory avoidance exhibited after a 20- 
second ECS can hardly be mediated by the 
CER. Moreover, if the ECS merely halts CER 

* This finding has been replicated and extended in 
a recent study by McGaugh, Landfield, and Dawson 
entitled De d development of amnesia Tollowing elec- 
troconvulsive shock: Further analysis. In preparation, 
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incubation, how can this effect explain the 
gradual reduction in the strength of the avoid- 
ance response over a period of hours? 


CERs in the Inhibitory Avoidance Situation 


Spevack and Suboski’s major assumption 
was that avoidance behavior in the inhibitory 
avoidance situations typically used to study 
ECS-produced retrograde amnesia is mediated 
primarily by a CER. This statement requires 
that an experimental distinction be made 
between the CER and the contingently pun- 
ished avoidance response. 

In their experiment (Spevack & Suboski, 
1967) rats were trained to bar-press for food. 
Subsequently, a footshock was delivered to one 
group of subjects while they were retrieving a 
food pellet. Both groups were then given a 
period of exposure to the experimental situa- 
tion, with the bar removed. Contrary to their 
prediction, this exposure failed to produce 
significant alleviation of fear over groups not 
so exposed. Spevack and Suboski concluded, 
however, that a CER was produced since sub- 
jects “not contingently” shocked (i.e., shocked 
while retrieving food) exhibited as much fear 
as subjects shocked contingent upon a bar 
press. However, since factors which affect 
either bar pressing or food retrieval will affect 
bar-press rates, shocks delivered following 
either response must logically exert a con- 
tingent effect upon bar-press rates. This pro- 
cedure fails to distinguish the CER from the 
specific avoidance response. One suggestion to 
improve it would be to utilize a reinforcement 
schedule in which the subject may be shocked 
when it is not. performing a response intimately 
involved in rate of bar pressing. 

Spevack and Suboski also referred to a pre- 
vious experiment by Chorover and Schiller 
(1966). The conclusions of this research indi- 
cate that when rats are shocked repeatedly in 
a confined area, significant expression of a CER 
is observed upon subsequent exposure to the 
chamber in which they were shocked. A CER 
is not produced, apparently, when the rat 
receives a shock upon entry into à chamber and 
is immediately allowed to escape. The employ- 
ment of this procedure (i.e., confinement and 
shocking) represents the sole situation 


repeated t | 
Spevack and Suboski may argue that 


in which 
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a CER is produced in passive avoidance situa- 
tions. Yet, since this procedure is, with limited 
exception (e.g., Bureš & Burešová, 1963,) not 
typically used to study retrograde alis the 
hypothesis proposed by Spevack and Süboski 
clearly cannot explain the wide variety of 
retrograde amnesia gradients which have been 
obtained in situations to which these criteria 
do not apply (e.g, Dorfman & Jarvik, 1968; 
Heriot & Coleman, 1962; Quartermain, Pao- 
lino, & Miller, 1965; Weissman, 1963). In fact 
most of these last-mentioned procedures are 
very similar to the situation used by Chorover 
and Schiller (1965, 1966) in which Spevack and 
Suboski assumed there was little evidence of ¢ 
CER. Moreover, they are also similar to the 
procedures in which avoidance response ns 
bation has been studied. As a consequence 
their analysis of the parallelism between ina 
bation and ECS gradients is inappropriate 
since the incubation gradients to which they 
refer are gradients of avoidance, not specifically 
gradients of CER incubation. j 


Gradients of Avoidance, CER Incubati 
i. 100, di 
Retrograde Amnesia ail 


As was previously mentioned, the incubation 
gradients to which Spevack and Suboski 
referred do not distinguis T 

à istinguish between the CER 
and the avoidance response. In any event there 
is no necessary correspondence between incubé 
tion gradients and retrograde amnesia adi- 
ents. Incubation gradients have been biphas ic 
(Irwin & Benuazizi, 1966; Zerbolio 1969) and 
U-shaped (Irwin, Kalsner, & Curtis, 1964) 
This is to be contrasted with retrograde 
amnesia gradients which have typicall 1] : 
shown to increase monotonically pti iu 
po & Jarik; 1968; Heriot & Colina, 

32; Quartermain et al., 1965). In studi * 
which incubation and retrograde is 
gradients have been examined in the I 
periment (Pinel & Cooper, 19664. 19665) ti "ed 
is no necessary correspondence be ipa 
io: etween the 

Obviously, more information ; 
establish what the bie b. nested w 
is, and the behavioral indexes i Cw n 
crouching, freezi c RU 
ase gas d urination, and im dl 

Wave B cation) 
HER epp e inel (1969b) has recently 

Some evidence that a CER may is 
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produced following a single footshock in rats 
trained in a drinkometer situation. The mea- 
sure used to assess the CER was percentage of 
animals freezing at least once on the retest. 
'The data indicate that after 10 seconds, 1 
minute, 10 minutes, or 1 hour the CER is weak 
(25% freezing); then it is much stronger at 
3 hours (85% freezing). Yet under these cir- 
cumstances the percentage of animals avoid- 
ing the drinking spout for 500 seconds was 
65% at 10 seconds and over 80% at all other 
test intervals. Therefore, the CER measure is 
in no sense as predictive of the strength of the 
avoidance response as Spevack and Suboski 
assumed. Interestingly enough, in this experi- 
ment the CER gradient, not the avoidance 
gradient (Pinel & Cooper, 1966b), roughly 
parallels the ECS gradient. But it must be 
emphasized, there can be no necessary corre- 
spondence between ECS gradients and retro- 
grade amnesia gradients as they are presently 
assessed, since the ECS gradient may be 
changed drastically by changing the treatment 
parameters. These variations can have no effect 
upon the incubation curve since incubation is 
ed independently on the basis of perform- 
ance at various times following the training 
experience. 

Spevack and Suboski did attempt one direct 
test of their hypothesis that *any experimental 
manipulation which affects the ECS gradient 
will have a similar effect on the incubation 
gradient and vice versa [p. 71].” To test the 
assumption they (Suboski et al., 1969) utilized 
a one-trial discrimination paradigm, yet again 
only considered //ie avoidance response. They 
also abandoned the use of a “confinement 
shock” procedure (since subjects did not dis- 
criminate well under these circumstances, 
according to the data of Suboski et al.) in favor 
of an “entrance shock” procedure. Thus, they 
adopted a situation in which the experimental 
production of a CER cannot logically be impli- 
cated on the basis of their past analysis of the 
conditions necessary to produce a CER (cf. 
Chorover & Schiller, 1966). 

Rats were shocked upon entry from a large 
field into one of two compartments. They were 
subsequently tested at 100 seconds and 3,160 
seconds following training or given ECS at 
these time points and tested 24 hours later. If 


they were allowed a choice on the subsequent 
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test, latency to respond was higher at 100 sec- 
onds than at 3,160 seconds for both groups. If, 
on the other hand, thev were forced to the 
chamber in which thev received shock, latency 
to respond was lower at 100 seconds than at 
3,160 seconds. The authors argued : 


if merely the avoidance response were increasing in 
strength, latencies could hardly be both increasing and 
decreasing as a function of time following avoidance 
conditioning. Rather, incubation of some central state 
is required [Suboski et al., 1969, p. 72]. 


Then they implicated the CER. Spevack and 
Suboski failed to consider that in this paradigm 
there are not one, but two possible avoidance 
responses. In the “forced” condition, subjects 
may only avoid the shock chamber, but in the 
“choice” condition subjects may also enter the 
alternate chamber. Moreover, an increased 
tendencv to avoid the shock chamber over 
time, mediated by memorial processes, may be 
expected to produce increased latencies in the 
forced condition and decreased latencies in the 
choice condition. Increased crouching and 
freezing over time, on the other hand, would be 
predicted to cause increased latencies to re- 
spond over time in both the forced and the 
choice conditions. Thus, not only did Spevack 
and Suboski fail empirically to implicate the 
CER in this paradigm, but increased strength 
of the CER is totally incompatible with these 
findings. Retrograde amnesia for the place 
where footshock was delivered is, however, in 
no sense incompatible with these results. This 
study contains an interesting method of dis- 
tinguishing between memory and CERs, how- 
ever, and the experiment should be repeated 
with care to ensure that a CER is produced. 

Obviously, the possibility of a change in the 
strength of an avoidance response is an im- 
portant consideration in inhibitory avoidance 
situations. However, at this point the possible 
contribution of a CER to the avoidance re- 
sponse is unclear. The overwhelming evidence 
suggests that incubation gradients bear no 
necessary correspondence to retrograde am- 
nesia gradients obtained with ECS. In view of 
this fact, it seems paradoxical when Spevack 
and Suboski suggested that “A number of 
investigators have chosen to regard incubation 
as direct evidence for memory consolidation 
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(eg, McGaugh, 1966) [p. 72]? McGaugh 
made this distinction quite clearly : 


increases in retention found with increases in time do 
not depend solely upon consolidation processes as 
indexed by electroshock effects. For all intervals up to 
one hour that were investigated, the performance of 
animals tested [italics added] at the end of the interval 
in question was superior to that of comparably trained 
animals given an electroshock at the end of the interval 
and a retention test the following day CMcGaugh, 
1966, p. 6.] 


Conclusions 


From the analysis it seems obvious that 
Spevack and Suboski set up a dichotomy of 
long and short ECS gradients based upon the 
misconception that particular ECS disruption 
gradients may be used to titrate the processes 
of memory consolidation. However, they 
failed to consider that these gradients are sus- 
ceptibility gradients rather than consolidation 
gradients, and in any learning situation there 
is not one, but a range of gradients which reflect 
treatment effectiveness in addition to the pro- 
cesses susceptible to disruption. It is my con- 
tention that, in general, the wide variety of 
retrograde amnesia gradients reflected varia- 
tions in treatment effectiveness although the 
use of arbitrary latency of response cutoffs can 
also be shown to change the measured retro- 
grade amnesia gradient. 

The specific hypothesis, that incubation of a 
CER and its subsequent halting by ECS pro- 
vide an explanation of the variety of ECS 
gradients, is not supported by the evidence. 
The hypothesis is obviously incompatible with 
the observations of long retrograde amnesia 
gradients in one-trial appetitive learning situa- 
tions. Moreover, Spevack and Suboski failed to 
provide evidence of CER involvement in the 
inhibitory avoidance tasks typically used to 
study retrograde amnesia. They failed to 
specify the nature of CER incubation per se or 
provide convincing evidence that a single ECS 
halts incubation of a CER. Finally, in the only 
specific test of this hypothesis (Suboski et al., 
1969), the results are totally incompatible with 
the CER incubation notion. Thus, the hy: 

pothesis not only fails to explain the authors 
own data, or the data from appetitive studies, 
but also the majority of the consolidation 


studies which utilize ECS. It is apparent that 
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incubation phenomena contribute under some 
circumstances to the performance level of ani- 
mals that have been punished. Yet from 
Spevack and Suboski’s treatment of this prob- 
lem we are given little information as to 
whether incubation reflects an underlying 
CER, under what circumstances a CER is 
produced, or whether ECS may affect incuba- 
tion. Tests of the hypothesis have in the main 
been difficult operations in an area where there 
is no presently workable criterion for distin- 
guishing between the CER and the specific 
avoidance response. 
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BASIC ISSUES IN REVERSAL-SHIFT BEHAVIOR: 
A REPLY TO KENDLER AND KENDLER 
L. R. GOULET! 


University of Illinois 


Selected nonmediational factors which affect. performance on reversal-shift tasks are 
discussed. The factors highlighted are those associated with the logical operation of 
reversing and the role of implicit associative responses in discrimination-shift tasks 


involving conceptually related materia! 


versus half-reversal-shift design is not appropriate for use in assessing the unit: 
effects of mediation or in a test of a mediational model of de 
and nonmediational factors in the execution of the reversal 


operation of mediational 
shift are also discussed. 


Kendler and Kendler (1969) have responded 
to two recent papers (Slamecka, 1968; Wolff, 
1967) which have questioned the legitimacy 
of attributing observed age differences in per- 
formance on reversal-shift and halí-reversal- 
shift tasks to developmental changes in me- 
diated symbolic behavior. Slamecka's paper 
highlighted sources of bias which mitigate 
against mediation-based interpretations of 
the results of discrimination-shift studies and 
was not directly concerned with develop- 
mental proces; In reply, Kendler and Kend- 
ler suggested: (a) A fundamental concern of 
theirs is to account for developmental changes 
in mediated symbolic behavior; (b) the theo- 
retical framework which has guided and 
justified their research dictates the use of 
certain shift paradigms (e.g., reversal-shift and 
half-reversal-shifts); and (c) the results of a 
combination of recent studies using the op- 
tional-shift design (e.g., Kendler & Kendler, 
1968) and the reversal-shift versus half- 
reversal-shift design (e.g., Kendler, Kendler, & 
Marken, 1969) have uncovered a develop- 
mental law extending over the age range from 
4 to 18 vears which, they believe, “reflects 
changesin mediated symbolic responses [ Kend- 
ler & Kendler, 1969, p. 231 ]." f 

The present paper does not question the 
reliability of the age differences in performance 
obtained by the Kendlers and their associates; 
nor does it take exception to the theoretical 
model which they espouse. Rather, it questions 
whether these : differences do, in fact, re- 
fect the unitary and age-correlated contribu- 


1 Requests for reprints should be sent to L. R. 
Goulet, Training Research Laboratory, 8 Lincoln Hall, 
University of Illinois, Urbana, Illinois 61801. 


ls. It is concluded that the reversal-shift 
ary 
velopment. The joint 


tion of mediational processes 
their model), and whether the alternate 
methods for studying performance on reversal 
shifts actually result in the measurement of 
the same underlying process if admitted sources 
of confounding (e.g., dimensional dominance 
intermittent reinforcement) are controlled, 
With these referents, two additional rule. 
based, but nonmediational, explanations of 
performance in reversal-shift tasks are dis- 
cussed. One explanation relates to the utiliza- 
tion of preexperimentally acquired habits such 
as "switching the responses," or "doing the 
opposite" as a factor (or sole basis) in the 
execution of the reversal shift. The second 
explanation relates to an extension of the 
frequency theory of verbal-discrimination 
learning to the reversal-shift task, 


(as implied by 


Res ponse-Switching 


Bogartz (1965) had young adult subject 
learn two consecutive paired-associates list in 
which randomly grouped sets of four sti : d 

(consisting of unrelated trigrams) wer, vo 

ciated to one of two alternate respon g a 

reversal shift required the reversing 5; T» 

responses to each of the stimulus sets x Be 

the half-reversal shift involved ia 

half of the responses jn each ii 

The reversal shift was executed r B 

than the half-reversa] shif " 

mediational interpret 
experimental query 
that 62.5% reported the 

“do the opposite.” 

Marquette and ( 
experiment by Bo 
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treatments designed to determine the existence 
of mediation under conditions which precluded 
the use of a rule such as “do the opposite" 
on the shift task. Thev reasoned that facilita- 
tive effects of mediation should occur even if 
new responses were used on the reversal-shift 


„task. The only requisite is that the same 


stimulus groupings be maintained on the train- 
ing and the shift. The results replicated those 
of Bogartz for treatments involving either 
identical or changed responses on the shift 
task and provided clear evidence for the 
existence of mediation in the reversal-changed 
treatment (Table 1). 

The data obtained by Bogartz and Mar- 
quette and Goulet do not speak to the issue 
regarding developmental changes in the use of 
mediation in analogous tasks, the main thrust 
of Kendler and Kendler's (e.g., 1962, 1969) 
model. However, the results of a parallel study 
(Goulet & Williams, in press) using first- and 
third-grade children are available. 

"Table 1 indicates that the reversal shift was 
executed more rapidly than the half-reversal 
Shift under conditions where the responses 
were maintained (reversal-identical and half 
reversal-identical treatments) on the shift. 
However, the opposite was true for the re- 
versal-changed and half-reversal-changed treat- 
ments. This pattern of results is apparent for 
both groups of children and suggests by exclu- 
sion that the rule-based strategy, “do the 
opposite," formed the basis of the rapid solu- 
tion of the reversal shift. Alternative explana- 
tions of the results, such as the disruption of 
performance in the reversal-changed treatment 


TABLE 1 
MEAN PERCENTAGE OF ERRORS ON SHIFT TASKS 


Shift treatments 


Group — - ~ 

R- | HRA R-C HR-C 

Grade 1% 25.00 37.12 52.25 37.25 
Grade 3* 15.62 32.62 37.62 39.87 


Young adults’ | 32.67 44.47 32.50 | 47.15 


a are taken from Goulet and Williams (in press) and 
nt shift errors on Trials 1-10, 

Data are Ma 
represent thi 
ent base for con 
adjust for differe! 
young adults. "m 
^ Note.—Abbreviations are: 
HR = half reversal; C = changed. 


oD 
repr 
b 


= reversal; I = identical; 


because of the introduction of novel responses 
on the shift tasks, are unlikely because the 
same conditions were apparent in the half 
reversal-changed treatments. 

The major issue concerns the lack of ob- 
served mediation in the Goulet and Williams 
study. One hypothesis concerns the role of 
preexperimentally acquired representational 
responses in the mediational process. That is, 
mediation may not be obtained when unre- 
lated and randomly grouped stimuli are used. 
Kendler and Kendler (1969) apparently favor 
this explanation even to account for the lack 
of performance differences in reversal-shift and 
half-reversal-shift tasks with young adult 
subjects. 

A second and not so apparent explanation 
is related to the number of mediators which the 
children must utilize in the reversal-shift task. 
In early research, a mediation-based solution 
to the reversal shift involved the utilization of 
a single mediator (see Kendler & Kendler, 
1962). Tasks of the nature used by Goulet and 
Williams (and by Kendler et al., 1969, and 
Kendler, Kendler, & Sanders, 1967), at the 
very least, require the utilization of two media- 
tors (one for each stimulus set) in the solution 
of the reversal shift. The increased difficulty 
of such problems can be documented by noting 
that the superiority of reversal shifts over 
nonreversal shifts is greatly diminished with 
young adult subjects when there are four 
response categories represented in the tasks 
(Kendler & Mayzner, 1956; Ludvigson & 
Caul, 1964). The possibility remains that the 
increased difficulty in using the additional 
mediator in tasks of this nature disrupts the 
performance of children, especially if the 
stimuli are unrelated (Goulet & Williams, in 
press). 


Reversal Shifts and the Processes in Verbal- 
Discrimination Learning 


This section of the paper deals primarily 
with a study Kendler and Kendler (1969) 
used to buffer their position regarding the 
sen ity of the reversal-shift paradigm to 
mediation and to developmental changes in 
mediated symbolic behavior. The procedure 
used by Kendler et al. (1969) required subjects 
to sort two sets of conceptually related pictures 
(e.g., cow, dog; apple, banana) into separate 
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classes (animals, fruit) on the preshift. The 
reversal shift involved switching the sorting 
response, whereas the half-reversal shift in- 
volved sorting in a conceptually mixed fashion 
(e.g., cow, apple; dog, banana). Kendler et al. 
(1969) found that the performance differences 
on the reversal and half-reversal shifts in- 
creased with age. While these age differences 
may reflect changes in mediated symbolic 
behavior, two alternative and nonmediational 
explanations are possible. 

The first explanation, response switching, 
has already been discussed. The favored inter- 
pretation here is based on theory developed 
to explain verbal-discrimination processes 
(Ekstrand, Wallace, & Underwood, 1966). The 
theory assumes that discriminations are based 
on frequency cues which accrue to individual 
items as a function of perceiving, pronouncing, 
or rehearsing the items in the list. Ekstrand 
et al. also provided convincing evidence that 
frequency units which accrue to an individual 
item may be indirectly increased through 
responding to associatively related items which 
occur elsewhere in the list. Thus, if sets of 
conceptually related items form the instances 
in the categories to be sorted on the preshift 
task, an overt or covert response to any one 
of these items will increase the frequency of 
that item and the other items (through an 
implicit associative response) in the category. 

Tt is clear that the occurrence of implicit 
associative responses would interfere with per- 
formance on the half-reversal-shift task and 
facilitate reversal-shift performance because 
of the conceptually mixed and unmixed nature 
of the sorting categories, respectively. Since 
the strength and complexity of language habits 
is expected to vary with age, it is also reason- 
able to expect such differences to be reflected 
in the performance on shift tasks which involve 
conceptually related materials. 


Conclusions 


The present paper has highlighted the exist- 
ence of alternative factors which likely affect 
rrelated performance differences on re- 
and half-reversal shifts. It is empha- 
however, that mediational and non- 
chanisms may operate jointly 
hift performance. For example, 
y and response switching 


age-co 
versal 
sized, 
mediational me 
in determining $ 
a mediational strateg 
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may be used simultaneously in executing the 
reversal shift. Alternatively, subjects may use 
nonmediational and mediational strategies 
successively on the reversal-shift task. In ihis 
case, it is possible that response switching is 
used early in practice, but that mediational 
mechanisms are of primary importance in the 
later stages of practice. However, even in this 
case, the study of mediation and providing 
empirical tests of a mediational model z 
development demands that alternate strategies 
are eliminated or that an independent ASSESS. 
ment of the role of nonmediational factors is 
available. In the absence of the latter, the 
proposal by Slamecka (1968) to use the “total 
change” design in research of this nature must 
be implemented. 

It is also important to note that the variety 
of tasks used in the study of performance on 
reversal shifts does not necessarily yield mea- 
sures of the same underlying processes, The 
optional-shift method may yield measures of 
one combination of processes (e.g. Jeffrey 
1965; Kendler & Kendler, 1968), and the 
reversal shift (e.g., using the paired-associates 
method) may yield a measure of the effects 
of another set of (possibly overlapping) proc- 
esses. This suggestion is highlighted by noting 
differential effects of implementing désien 
modifications similar to those suggested E. 
Slamecka. When Jeffrey changed the form ot 
the stimuli (and implicitly the subjects : 
sponses) after original learning, the roue 
of children who made a reversal shift ne 
increased (relative to a condition where Es 
stimuli were unchanged). Goulet and Willi; * 
(in press), however, found the opposite ies 
of results with the paired-associates Mn i 

Furthermore, Kendler et al. (1967, E E 
ment IV) using young adult subject Xperi- 
provided evidence that a response. ` have 
strategy was likely not used in fale. 
where sorting could be accomplish "ene 
basis of a superordinate, representa, on the 
sponse. Again, it is likely that subj A 
those strategies which are effi Jects utilize 
solution of the discrimination do a in the 
nately, conventional reversal-shift b. Unfortu- 
vide a condition where a number designs pro- 
strategies for solution are ea of alternate 
modifications of the desi ounded, When 
to eliminate the use of 


ational re. 


T are implemented 
alternate strates: 
Strategies, 
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evidence for mediation has been obtained, at 
least when young adult subjects were used 
(e.g., Slamecka, 1969). More important, how- 
ever, the nature of the developmental functions 
have yet to be ascertained. 
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hoy 


quately integrates the phenomena 
reversal-shift behavior. 


In responding to Kendler and Kendler 
(1969), Goulet (1971) neither questions the 
developmental differences in discrimination- 
shift behavior we reported nor takes exception 
to the coordinated single-unit and mediational 
stimulus-response formulation we offered to 
account for the ontogenetic changes. Instead 
he expresses reservations about our attempt to 
integrate developmental changes from two 
different experimental paradigms, the optional 
shift and the reversal versus half-reversal, 
within one theoretical framework. His main 
argument is that results from the latter para- 
digm might be interpreted in terms of two 
rule-based but nonmediational mechanisms. 

One such nonmediational mechanism 
"the utilization of preexperimentally acquired 
habits such as ‘switching the responce,’ or 
‘doing the opposite’ as a factor (or sole basis) 
in the execution of the reversal shift.” Let us 
consider four problems related to this 
suggestion : mE 

1. What is the meaning of a nonmediational 
mechanism? We have equated nonmediation 
with a single-unit mechanism suggested origi- 
nally by Spence (1936) in his analysis of the 

' This paper was written when Howard H. .Kendler 
was a Fellow at the Center for Advanced Study in 
the Behavioral Sciences, Stanford, California. The 
issues discussed are outgrowths of research sponsored 
by the following grants: National Science Fountain 
Grant GB-6660 and Office of Naval Research Contract 


Jonr-4222 (04). 
A Regue for reprints should be sent to Howard 
H. Kendler, Department of Psychology, University of 
California, Santa Barbara, California 93106. 


is 


diational mechanisms (doing- 
developmental changes obtained 
-shift experiments suffers from several 
v mediated and nonmediated mechanisms 
hypothesis fails to account for 
ffered in support of the 


discrimination learning of rats; 
cues become directly connected, 
sense, to the instrumental respo; 
to this meaning of nonmedi 
control of behavior resides in 
stimuli. In contrast, 
volved, stimulus contr 
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environmental 
e SH is in- 
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shifted to implicit response-produced eio 
Within this context the rule of doing-the- 
opposite would qualifv as a mediational me- 
chanism since the overt response must be at 
least Partially, cued by a response of IT ` 
organism. Two forms of mediation sug > 
themselves as underlying the doing-the.. gges 
site rule: an overt-responge mechanism e 
implicit logical operation. The Overt = 
mechanism would operate EDO 
situation when some comp 
sponse-produced cues for m 
becomes the cue for making 
The implicit logical operatio 
some abstract principle in w 
implicitly guides his choice behavior 

logical operation of reversing " lor by the 
mediational mechanisms are dmt of these 
conceptual form of mediation a [rom the 
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uis i reversal-shift per developmental 
stressed that form of medi, havior, We have 
discriminanda generate Mer 1n which the 
when pictures of à mik s ing responses as 
transformed into thezen,. °8 and lion are 
of animals. Presentational response 


290 


an 
1 nse 
in a two.choice 
Onent of the re 
aking one choice 
the other Choice, 
n would involve 
hich the Subject 


have 


REPLY TO GOULET 291 


2. The evidence suggests that the doing-the- 
opposite hypothesis when interpreted as an 
overt-response mechanism is wrong, and when 
interpreted as some logical operation, it is 
at best, incomplete. Although Goulet chooses 
to ignore the optional-shift paradigm, it must 
be noted that the overt-response form of the 
doing-the-opposite hypothesis is incapable of 
explaining the results from such a design. 
During the optional-shift stage of this para- 
digm, the subject is required to reverse the 
responses acquired during the initial training 
stage, but this reversal is consistent with both 
a reversal and extradimensional shift. Thus 
doing-the-opposite from what was done during 
initial training would neither favor reversal 
nor extradimensional choices and thus could 
not explain the increased tendency with age 
to select a reversal shift (Kendler & Kendler, 
1970; Kendler, Kendler, & Learnard, 1962). 
The overt-response form of the hypothesis 
also is unable to explain the findings of a 
recently published study (Kendler, Kendler, & 
Marken, 1970) that a reversal shift with new 
words from the same two conceptual categories 
used during preshift training was executed as 
rapidly as when the words used during both 
preshift and postshift training remained un- 
changed. Not having learned any choice re- 
Sponses to the new words precludes doing-the- 
opposite. If the doing-the-opposite hypothesis, 
either in its overt-response or logical opera- 
tion form was the "sole basis" for executing 
a reversal shift, reversal shifts with sets of 
trigrams should be as fast as with sets of words 
Írom the same conceptual category, a predic- 
tion that is at odds with the facts (Kendler, 
Kendler, & Sanders, 1967). 

The logical operation form of the doing-the- 
opposite hypothesis can be improved by assum- 
ing that its effectiveness in mediating reversal 
behavior depends on the availability of rep- 
resentational labels. The findings of the studies 
just cited as being opposed to the overt- 
response hypothesis could be incorporated 
within the expanded version of the logical 
operation assumption that includes representa- 
tional responses. But as Goulet notes when the 
influences of doing-the-opposite and mediated 
representational responses are pitted against 
each other, the latter proves to be the more 


dominant (Kendler et al., 1967, Experiment 
IV). 

3. Against the weight of evidence cited in 
Point 2, Goulet offers the results of an un- 
published study that indicates for first- and 
third-graders a "reversal shift was executed 
more rapidly than the half-reversal shift under 
conditions where the responses were main- 
tained . . . on the shift. However the oppo- 
site was true for the... treatments [in 
which the responses were changed during post- 
shift training |." Thus, the suggestion is offered 
that the children in the unchanged response 
condition used the doing-the-opposite rule in 
the absence of any mediating representational 
responses; when new responses are required 
which preclude the use of the rule, the children 
find a half-reversal easier. These data suffer 
from two limitations. First, when new re- 
sponses are added in postshift training, the 
design, strictly speaking, cannot be considered 
a reversal-shift study. In our analysis of re- 
versal behavior we have emphasized the im- 
portance of the fact that when the subject in 
postshift training shifts for the first time from 
his preshift mode of responding, he is automati- 
cally reinforced. A rapid reversal shift occurs 
when the appropriate mediating response per- 
sists while the previously correct choice is 
abandoned. The second insufficiency is that 
the evidence offered cannot be scrutinized 
with the same care as published procedures 
and results are. But even the minimal informa- 
tion provided raises doubts about Goulet's 
conclusions. The half-reversal, when new 
responses are introduced, is not executed more 
rapidly than a reversal for both first- and 
third-graders as Goulet states. Third-graders 
had a higher percentage of errors (39.87) when 
half-reversing than when reversing (37.62), 
One can raise the question of whether it is 
appropriate to talk about speed of reversing 
with the unconventional response measure 
Goulet uses, the percentage of shift errors on 
Trials 1 to 5. Typically, in reversal versus 
nonreversal or reversal versus half-reversal 
studies, postshift learning is measured in terms 
of the number of trials (or sometimes errors) 
to criterion. The reader is left in the dark about 
the speed with which Goulet’s subjects reached 
the criterion of postshift learning and the rela- 
tion between his unconventional response mea- 
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sure and more traditional ones. Finally, reser- 
vations can be expressed about Goulet's logic 
when he argues that the results of his unpub- 
lished study? could not be due to the disruption 
produced by introducing new responses during 
postshift training. He assumes that the half- 
reversal subjects would experience the same 
disruption as reversal subjects. Perhaps switch- 
ing to new choice responses would have a 
special effect upon mediated symbolic re- 
sponses of the young children and therefore 
would retard a reversal relatively more than a 
half-reversal. It should be noted that a develop- 
mental trend is suggested in the results of the 
discrimination-shift studies with changed re- 
sponses; as age increases, the relative superi- 
ority of reversal over half-reversal increases. 
4. The above criticism of the doing-the- 
opposite hypothesis should not be interpreted 
as denving that following a reversal shift many 
young adults report that they “reversed their 
responses" (Bogartz, 1965). This is a common, 
although not universal, introspective report 
given during postexperimental questioning. 'The 
issue of the relationship between the report of 
the phenomenal experience during and after 
problem solving is a most complex one, much 
too complicated to be discussed briefly except 
to note that it shares some of the qualities 
of the ancient riddle as to which came first, 
the chicken or the egg. Does the phenomenal 
experience explain the problem solving or vice 
versa? Is the introspective report an explana- 
tion or something to be explained? Only by 
developing independent measures of hypothe- 
ses and choice behavior at different stages of 
training will it be possible to understand their 
interaction in discrimination-shift studies. 


Although Goulet accepts the doing-the- 
opposite hypothesis to explain his unpublished 
results *by exclusion" of presumably other 
hypotheses, he nevertheless is able to generate 
another explanation, in this case, a media- 
tional one, namely, more than one mediator is 
required in his unpublished study and in some 
of the recent studies that use conceptual in- 
stances in the form of words (Kendler et al., 

3 Although Goulet's “unpublished study" (Goulet & 
Williams; see preceding article) is currently in press in 
the Journal of Experimental Child Psychology, when Pat 
t authors were given an opportunity to respond 


nanuscript was unpublished. Therefore, 


presen 
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to Goulet, his n 
his results and data are referre 


throughout the present article. 
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1967) or pictures (Kendler, Kendler, & 
Marken, 1969). The problem of the umber of 
mediators to represent a group of instances is 
a significant problem and has been investigated 
in adults (Kendler & Watson, 1969). Goulet is 
wrong in believing that the number of media- 
tors is the variable responsible for the educ: 
tions in the superiority of reversal shifts over 
nonreversal shifts when four instead of two 
sorting categories are used. As noted, with two 
choices a shift from the previously correct 
response automatically produces a correct 
choice consistent with a reversal shift. Re- 
searchers in discrimination shift Should recog- 
nize that a reversal shift in a two-choice xd 
involves both response and stimulus reversals 
whereas in a four-choice situation only stimulus 
reversals are possible. 

Although Goulet mentions the doing-the- 
opposite hypothesis first to explain reversal- 
shift results, he admits to favoring the fre. 
quency theory of verbal discrimination (Eks. 
strand, Wallace, & Underwood, 1966) This 
theory, based upon language behavior ae 
verbal associations, is classified by Goult à 
nonmediational, but in terms of our framework 
would represent another form of mediati 
The influence of this formulation is a d 
sketched out, and we will look forward vi 
interest to its empirical and theoretical dea 
opment to account for ontogenetic chan Me 
discrimination-shift behavior, Certai he E 
interpretation, which Obviously n ng sd 
tional refinement, is not the * s addi- 
theory that can be proposed. Nar too 
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that other processes may be Mine : aie, 
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ERRATA 


In the article, “A Model for Partitioning Judgment Error in Psychophysics" by 
Richard K. Eyman and P. J. Kim in the July 1970 issue, Table 5 on page 45 contains 


two errors. 


1. The F ratios reported in Table 5A, “Placement of Standard X Stimulus Levels," 
were for mildly retarded patients at the hospital rather than for volunteers from the 
research staff. The proper values for Table 5A are as follows: 


TABLE 5 


An: 


SIS OF VARIANCE OF MEAN RESPONSE AND 
HREE MEASURES OF RESPONSE VARIATION 


Experimental 


Response measure 


condition 
Auk 


li.w—Sil/Se v| Sisk 


A. Placement of Standard X Stimulus Levels 


Category A vs, C 
Standard effect 
I 


C 
Stimuli (J) 
IXJ 


Magnitude A vs. C 
I 


J 
AXI 


Category B vs. D 

I 

J 

IXJ 
strane B vs. D 


J 
IXJ 


1,33 3.47 
EI 1.22 
1.22 7.61 
03 7.59 
62 02 
2.01 2.33 
10.06 65 
1:95. 73 
10.76* 1.98 
RU! 4.81 
1.24 M 
04 91 


*a = 01. 
Statements in the text regi 


arding Table 5A, namely, the first paragraph in the second 


column on page 43, and the first paragraph in the first column on page 46 should be 


disregarded. 


2. The F ratios reported for Conditions B and C in Table 5B, “Scaling Method 
X Stimulus Levels,” were reversed. Statements in the text regarding Table 5B are 
correct if this reversal is taken into account. 

The general conclusions of the article are unaffected by these errors, 
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Some generalizations are given of a least squares method 


fit two intercepting lines to a set of data 
considered instead of straight lines. 
duced. Methods are given for locating 
polynomial in a portion of the range o 
polynomial relates the dependent varial 


of the range of the independent variable. A decision 
the polynomials under consideration, based on a seq 


In a previous report (Bogartz, 1968), a sim- 
ple extension of the method of least squares was 
presented for locating a critical value X, when 
the dependent variable F lies on one line seg- 
ment F = a1X + bi if the independent vari- 
able X is less than or equal to Y, but lies on 
another line segment V = aX + bif X > X,, 
given the restraint a1X , + b = asX, + bo. Al- 
though some minor complications can arise, 
the basic approach was quite simple and 
straightforward. Since the location of X, is not 
known, all possible locations are considered. 
That location which minimizes the sum of 
squared deviations from the regression function 
is chosen. Exploration of all possible locations 
is a simple matter achieved by partitioning the 
observed set of X values in the following man- 
ner. If the observed X; are Xi, X», «++, Xx, 
then for each possible value of i, X1 — X; form 
one set of Xs and X;,1 — Xx form the other 
set. The standard techniques of linear regres- 
sion are then applied to the two sets separately. 
The sum of squares for deviations from linear 
regression are found for each set separately and 
added. That value of 7 is used which vields the 
smallest total sum of squares for deviations 


from regression. 

This report presents some interesting gen- 
eralizations of the previous work based on the 
same simple approach. In all cases we are con- 
cerned with polynomial regression rather than 
linear regression, so that the previous results 
will in fact be a special case of the results given 
here. In polynomial regression we have 


dy-F uiX d aX eed ax p] 


y 
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The expression on the right side of Equation 1 
is a polynomial of order j and is denoted by 
P(j). Given a data vector Y of n values of the 
dependent variable and a Corresponding vector 
of 4X values, the polynomial regression 
model is 


Y; 1 Xi X; Xj 
Ys 1 X. X? X 
Ps l Xx xs xy 
a el 
d» : 
"e de C2 
% en) 


or Y = X'A; + e. It is well 
least squares solution for the 
cients, Aj, is A; = (X, X’) 
sum of squares for devi 
regression is. SS, = y'y — 
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Case 1: Arbitrary pair of polynomials Pi(j) 
ånd P2(k), j and k given. In this case we seek 
the best X. such that 


Pi) = aw anX + aX? + e 
+ axi X < X, 


Palk) = azo + aX + aX? + +++ 
+ ayX* if X > X. 


a 
4 “The procedure is straightforward and com- 


pletely analogous to the treatment of two 
arbitrary lines which is the special case 
Y= 


Y= 


PA) if X<X, 
Pl) if XX. 


If there are » values of X, fit P1(7) for X1 — X; 
and Pa(k) for X;,1 — Xn, letting i range from 
j+ 1 ton — & — 1. The number j + 1 is the 
smallest value of i such that P;(j) can be fit, 
and the number 1 — k — 1 is the largest value 
of i such that ?2(k) can be fit. At each value of 
i, find Y4Y, — A’jaNjaVy, the error sum of 
squares for Pi(j), and VY» — A, aX rY o, 
the error sum of squares for P.(k). The 
total sum of squares for deviations from regres- 
sion. is then (FAóY,— Aly Ry YS) + (F: 
— Aj, 2X4, 5V2). Choose that value of i which 
yields the smallest total sum of squares for 
deviations from regression. The two poly- 
nomials should intercept each other at X., 
where X; € X, < Xia. Xe can be found by 
setting P1(j) = P»(k) and solving for X. If 
the intercept does not fall in the correct inter- 
val, procedures analogous to those described 
by Bogartz (1968) can be followed. 

Case II: Localing a region of disturbance 
following the polynomial P2(k) in a function 
which is otherwise the polynomial Pi(j). As an 
example of this case we can take the situation 
where we expect that F will be a linear function 
of X throughout the range of X except for a 
region of disturbance from XY, to Xain which Y 
will follow a third-order polynomial. The prob- 
lem is to locate X, and X,. In general, the 
model is 


Yap if 
yz Palk) if 


XXX. or Jey 
Mig SOS UN 
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The strategy analogous to Case I is obvious. 
The model admits the extreme possibility that 
the region of disturbance completely covers the 
range of X. This possibility can be explored by 
finding the sum of squares for deviations from 
regression if P.(k) is used for all X. Then all 
possible locations for Ps(k) are considered, 
fitting Py(j) to all Xs outside that region. 
The region giving the smallest total sum of 
squares for deviations from regression, defined 
as for Case I, is then selected as the region of 
disturbance. 

Case I and Case II are obviously general- 
izable further. In Case I, an arbitrary number 
of polynomials can be considered. In Case II, 
an arbitrary number of disturbance regions can 
be considered, etc. 

Case TIT: Arbitrary polynomials of unknown 
order. The situation here is as in Case I except 
that j and £ are unknown, The decision process 
involves not only location of X, but also the 
order of the polynomials which are to be fit. 
The approach suggested here is to once again 
sequentially consider all possible locations of 
Xe For each X, candidate, a Pi(j) and a 
Ps(k) must be fit according to a decision rule 
for finding j and k. Such a decision rule can be 
obtained if we introduce a normality and inde- 
pendence assumption for the errors. Assume 
that for each polynomial the error vector has a 
multivariate normal distribution with vector of 
means equal to the zero vector and covariance 
matrix equal to g? times an identity matrix 
for X < X, and c? times an identity matrix 
for X ux. 

The rule will be that in fitting P1(j) to the 
data for X < X., begin with a polynomial 
PA(J), where J is the largest value of j which 
will be considered as a possible value. This 
maximum value J will be determined either 
a priori or by the number of X values less than 
or equal to the XY, candidate. Now test the 
hypothesis M; that Pi(J) fits the data for 
X < X, better than Pi(J — 1) does against 
the alternative null hypothesis 7/7, that a 
polynomial of order J — 1 does as good a job 
as a polynomial of order J. If the hypothesis 
H 1 is rejected, stop. If it is not rejected, test 
the Hs. against Hy», that is, does a poly- 


nomial of order J — 1 do better than one of 
order J — 2. If Hgy- is rejected, stop. If 
not, continue downward, testing H,—, against 
H 1 either until H z——1 is rejected in favor of 
H,,_, or until the last test, at which r = J — 1. 


(a — IVA aX ai = 


RICHARD S. BOGARTZ 


The analysis of variance and a sequence of 
F tests can be used to perform the above se- 
quence of tests of hypotheses. For each value 
of r, r=0— (J — 1), 


- test H3, against 
#y_,-1 using E 


D y. raa pena P) 


Fims = 


where the notation is as above and 7 is the 
number of scores in Vy, the F scores associated 
with X values less than or equal to the can- 
didate X.. A significant F implies rejection 
of H ji. 

When a stop is achieved such that say hy- 
pothesis Æ, is accepted, find the sum of squares 
for deviations from regression, ‘Vy 
— Á'iiX iia 

Now, still at the same candidate X, value, 
proceed with the same approach to deciding 
the value of k for P2(k), the polynomial to 
relate the Xs above X, to their associated Ys. 
If the stop occurs at K = u, the sum of squares 
for deviations from the second polynomial will 
be Y^;Y» = A'u2X u,2Y s. 

The total sum of squares for deviations from 
regression for this candidate X, value will 
then be 
VAY, — A aX aY ut Kay Auau 
Finally, choose that X, which minimizes this 
total sum of squares for deviations from 


regression. 
Case IV: Locating a region of disturbance 
following a polynomial Ps(k) of unknown order 


2. 


(F4F;, — A" raX ra Yi) 


k in a function which is otherwise the polynomial 
P,(j) of unknown order j. Only u few words ae 
needed here. We proceed by analogy to Casal * 
II and III. We consider all possible candidate 
regions Xe < X € X, for the region of dis- 
turbance as we did in Case II and decide on 
the order of /(7) and P2(k) as we did for Case 
II, using the sequence of F tests approach, 
Other cases. We can generate many more 
cases by considering that for each of the above 
cases, one of the polynomials may be of known 
order and the other of unknown order. The 
appropriate combination of search routine with 
computation of the two pieces of deviations 
sums of squares, one with the sequential F te 
the other without, will in all cases generat 35 
best (least squares) X, or best QU) pai. 
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trong sympathetic activation is not characteristic of the excitement, or 
initial phase, of sexual arousal, but is more pronounced in the postintromission 
plateau and orgasmic phases. Heart rate, electrodermal measures, blood 
pressure, pupillary dilation, and catechol amines have shown some response 
during the excitement produced by erotic visual stimulation, particularly 
motion pictures, but the responses are not comparable to those seen during 
sexual activity. These responses are not specific to sexual arousal but may 
reflect orienting to novelty or emotions other than sexual arousal. Measures 
| ‘of penile erection have been developed, and have proven useful as specific 
| 


sexual arousal indexes. Specific vaginal measures for females are being de- 
veloped. Stimuli, set, and subject interactions are important in physiological 
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sexual arousal. 


In an article about sex research appearing 
in the New York Times, Gebhard is quoted as 
saying to Masters, during an automobile trip, 
“Watch where you're going, Bill. If you get us 
all killed, there goes sex research in the United 
States [Buckley, 1969, p. 106]." Although 
Gebhard's statement is somewhat exaggerated, 
it is true that the Kinsey Institute and the 
Reproductive Biology Research Foundation 
represent the major sources of scientific in- 
formation about human sexual patterns. How- 
ever, many investigators are now becoming 
active in this field, and it may be helpful to 
‘survey some of the physiological methods 
currently in use. A later review will deal 
with psychological methods. This article 
deals with quantifiable physiological methods 
of measuring sexual arousal in the human. 
Many of the methods are still in develop- 
mental stages, and details about them were 
provided by generous investigators. It is 


1 This review was supported by the Commission 
on Obscenity and Pornography. A summary of the 
review was read at the Ninth Annual Meeting of 
the Society for  Psychophysiological Research, 
Monterey, California, October 1969. 

Requests for reprints should be sent to Marvin 
Zuckerman, Department of Psychology, University of 
Delaware, Newark, Delaware 19711. 


hoped that by facilitating scientific com- 
munication, more research will be stimulated 
in this vital area. 

Many important questions seem to be 
awaiting the development of appropriate 
methodology. One example which involves an 
area of public debate is the question of por- 
nography. Every investigation of this problem 
from a legal or social standpoint has con- 
cluded with the statement that not enough 
scientific information is available. It is obvious 
that scientists cannot answer questions, such 
as how arousing pornography is, or who is 
most aroused by pornography, until they can 
decide on reliable methods for measuring 
sexual arousal. Theoretical issues must also 
await the development of suitable method- 
ology. 

Whalen (1966) has made a useful dis- 
tinction between sexual arousal, the momen- 
tary or current level of sexual excitement, 
and sexual arousability, or “an individual's 
characteristic rate of approach to orgasm as a 
result of sexual stimulation [p. 152]." The 
distinction is similar to that made between 
state and trait by other investigators concern- 
ing other emotions (Cattell & Scheier, 1961: 
Spielberger, 1966; Zuckerman, Persky, & 
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Link, 1967). Zuckerman et al. (1967) have 
suggested that a trait may be measurable as 
the average and variation in a series of state 
measures. Applving this to sexual arousability 
it would be possible to define arousability by 
a series of measures of sexual arousal to stand- 
ard stimulation. Beach (1956) has reported 
high reliabilities for various behavioral indexes 
of sexual arousal in rats. Comparable data on 
humans are not available. 

Sexual arousability can be measured by 
behavioral and verbal report measures. 
Latency to orgasm, frequency of orgasm, sub- 
jective estimate of arousal are some alterna- 
tives to physiological recording. However, 
physiological measures offer some obvious ad- 
vantages in their objectivity, continuous 
sampling, and the fact that they can be used 
to measure arousal without the necessity of 
inducing orgasm. Although some mechanical 
devices have been developed for inducing 
orgasm in a somewhat standard fashion 
(Masters & Johnson, 1966; Sobrero, Stearns, 
& Blair, 1965), most measurements which 
rely on orgasm introduce many uncontrolled 
variables, the least of which is movement. 
The social nature of conventional sexual 
stimulation and the influence of the setting in 
which orgasm is induced, as well as the volun- 
teer selection problem, are problems in this 
kind of experiment. Psychological, or visual 
and auditory stimuli are more easily stand- 
ardized and offer the possibility of sampling 
a wider variety of subjects. Furthermore, it 
is questionable that arousal can be measured 
by orgastic ejaculation since the two may de- 
pend on different neural mechanisms. This 
problem is discussed later in this review. 

It should be pointed out that the tremend- 
ous fund of data collected by Kinsey and his 
co-workers (Kinsey, Pomeroy, & Martin, 
1948, 1953) was based on retrospective re- 
ports. The data presented by Masters and 
Johnson (1966) are physiological and ana- 
tomical, but not very quantitative. In de- 
scribing the course of physiological reactions 
during the various stages of arousal, Masters 
and Johnson reported modal and ranges of re- 
actions with little indication of normal varia- 
tion. Some of the changes described may be 
intrinsically difficult to measure, but most, 
such as blood volume changes, temperature 
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changes, muscle tension, hyperventilation 
tachycardia, and sweating are accessible to 
measurement by standard Psychophysiological ^ 
techniques. Special applications of these tech- 
niques of measuring sexual arousal are de- 
Scribed in this paper. The description of modal 
patterns is useful, but more quantitative 
specificity is necessary in studying relation. 
Ships between variables and comparing in- 
dividuals or groups of individuals, 


Central Nervous System 


Sexual arousal is undoubtedly mediated 
through the central and autonomic nervous 


systems and may also involve the pituitary 


gonadotropic and gondal System. Mone 
(1961) has summarized: » a bi 

| 
among the coordinates of sexual function there are 
three: local genital surfaces, the brain, the hormones, 
any of which can fail in its contributions without. 
total destruction of sexual function - +» Nonetheless 
it is evident that loss of any one of th oy 
stituents is an immense handicap to eff 
functioning [p. 1396]. 


€ three con- 
ective sexual 


Beach (1958) has noted that there are Species 
and sex differences in the dependence of sexual 
arousal on the neocortex and hormones TÀ 
evolutionary trend is toward more Stress i 
the former and less on the latter, y 
MacLean (1965) has discussed the 
sexual functions of the brain. vis 
olfactory senses are import : 
stimulation of sexual arous 
gested that the visual sense 
important in the course of 
ever, the importance of visu, 
stimulation in the displays a 
ing birds would Suggest that 
of these senses are not con 
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arousal centers have been found ; 
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mammillothalamic tract, anterior thalamic nu- 
clei, and anterior cingulate gyrus); and parts 
of the medial orbital gyrus, medial dorsal 
nucleus of the thalamus, and regions of their 
connections. The medial part of the medial 
dorsal nucleus and medial septopreoptic re- 
gion are said to be modal points for erection. 
Stimulation in the septum and rostral dien- 
cephalon which result in erection is also noted 
to be associated with afterdischarges in the 
hippocampus during which time erections be- 
come throbbing in character and reach maxi- 
mum size. Following these hippocampal after- 
discharges, the monkeys appear to be calm 
and placid for some time. There is a strong 
suggestion that the hippocampal discharges 
are linked to the phenomenon of orgasm and 
postorgastic decline of arousability in the 
male. However, ejaculation is not associated 
with the hippocampal afterdischarges. Stimu- 
lation in the thalamus or other points within 
and bordering on the caudal intralaminar re- 
gion and along the course of the spinothalamic 
pathway elicit seminal discharge with motile 
sperm and quasipruritic scratching of the 
genitalia, The seminal discharge could occur 
without the appearance of throbbing penile 
erection. Beach (1956) suggested a distinction 
between a sexual arousal mechanism (SAM) 
and intromissive and ejaculatory mechanism 
(IEM) in the male? and MacLean’s evidence 
shows the neural separation of these mecha- 
nisms. Sobrero et al. (1965) used a vibrating 
cup applied with gentle pressure to the glans 
penis to obtain semen samples from schizo- 
phrenic males and males with infertility prob- 
lems. Although ejaculation was eventually 
induced in all 40 infertile males “no full erec- 
tion was ever observed, although in some in- 
stances a partial, very soft erection was ob- 
served at the time of ejaculation [p. 767].” 
Similar results were obtained with five normal 
subjects. Only 5 of these 45 subjects reported 
erotic fantasies or orgasmic feelings. In Hoh- 
mann's (1966) study of the effects of spinal 
cord lesions on emotional feelings, 15 of the 
425 subjects were still able to have erections, 
but only 3 reported ejaculation, and 4 re- 
\ ported the experience of orgasm. These data 
t 


5 [n the later articles Beach referred to these as the 
arousal mechanism (AM) and 
mechanism (CM). 
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provide behavioral evidence for the separation 
of SAM and IEM. Other findings relevant to 
the independence of ejaculation and orgasm 
are discussed by Beach, Westbrook, and 
Clemens (1966). The implication of the inde- 
pendent functioning of these systems is that 
latency of ejaculation cannot be equated with 
penile erection as alternate indexes of sexual 
arousal. The IEM seems to be more de- 
pendent on autonomic and sensory feedback 
than the SAM. 


Autonomic Nervous System 


The limbic structures which are involved in 
sexual arousal have connections with the 
hypothalamus which in turn may involve the 
autonomic and pituitary-gonadotropic systems. 
Gellhorn and Loofbourrow (1963) stated that 
appropriate hypothalamic lesions may abolish 
sexual behavior; but if the hypothalamus is 
intact, destruction of large parts of the cere- 
bral cortex has little effect (in rats and rab- 
bits). Sex hormones may sensitize the hy- 
pothalamic centers involved in the sex drive. 
These authors maintain that the sex act is 
accompanied by simultaneous parasympathetic 
and sympathetic discharges. Wenger, Jones, 
and Jones (1956) have suggested that the sac- 
ral portion of the parasympathetic nervous 
system dominates in the initial phases of 
sexual arousal, but that the sympathetic sys- 
tem becomes more prominent as orgasm ap- 
proaches. After orgasm there is an overcompen- 
satory phase of parasympathetic dominance. 

Kinsey et al. (1953) reviewed the litera- 
ture (what there was of it) on autonomic 
components of sexual arousal. What is in- 
teresting in this review is the contrast of the 
autonomic patterns in sexual arousal with 
those in anger, fear, epilepsy, and pain. The 
problem of emotional specificity is an im- 
portant one to consider in measuring responses 
to sexual stimuli since these stimuli might also 
elicit secondary reactions whose effects might 
be mistaken for those of sexual arousal. Both 
anger and sexual arousal may elicit increases 
in pulse rate, blood pressure, hyperventila- 
tion, adrenaline secretion, muscle tension, and 
inhibition of gastro-intestinal activity. How- 
ever, sexual arousal is distinct in the “in- 
variable increase” in surface temperatures, 
color, tumescence, genital secretions, rhythmic 


muscular movements, and orgasm. Fear may 
also increase pulse, blood pressure, breathing 
rate, adrenaline secretion, and muscle ten- 
sion; but the increase in peripheral circula- 
tion of blood, vasodilation, genital secretions, 
salivary secretions, and rhythmic muscle 
movements are more characteristic of sexual 
arousal. The muscular tensions and rigidities 
of epileptic seizure are common to both 
epileptic fits and sexual responses as orgasm 
is approached. In some cases, orgasm may 
actually occur during epileptic seizures. In 
general, only tumescence, vasodilation, genital 
secretions, and rhythmic muscular movements 
are characteristic of sexual arousal alone. The 
rhythmic muscular movements are a spinal 
neural mechanism and generally are limited 
to the postintromission phase of sexual arousal 
just prior to and during orgasm. 

Masters and Johnson (1966) have sum- 
marized their findings for the four phases of 
sexual arousal: excitement, plateau, orgasmic, 
and resolution. In the excitement phase, penile 
erection, thickening, flattening, and elevation 
of the scrotal integument, and moderate tes- 
ticular elevation and size increase are the 
only typical reactions in the male. Nipple 
erection, vaginal lubrication, thickening of 
vaginal walls, flattening and elevation of 
major labia, and expansion of the vaginal 
barrel are found in women. Only penile erec- 
tion and vaginal lubrication are found in the 
immediate response (3 to 15 seconds) to 
sexual stimulation. Sympathetic system re- 
sponses such as hyperventilation, tachycardia, 
and muscle tension are not characteristic until 
the plateau phase of arousal (approaching 
orgasm). Cowper's gland emissions in the male 
and Bartholin's gland emissions in the female 
are also found in this phase. Sympathetic 
reactions reach peak during the brief or- 
gasmic phase when involuntary muscle con- 
tractions appear. Various contractions in 
genital and accessory organs and in the ex- 
ternal rectal sphincter are seen during orgasm. 
Sympathetic reactions and vasocongestion di- 
minish gradually during the resolution phase. 
A sweating reaction is seen in 30% to 40% 
of subjects. According to these authors, the 
only specific indexes of sexual arousal which 
could be used early in the excitement phase 
of sexual arousal would be penile erection in 
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the male and vaginal lubrication and pos- 
sibly nipple erection in the female. A “sex 
tension flush" is seen in a minority of fe- 
males (25%) during the excitement phase, 
but only comes into prominence in a majority 
(75%) of females in the plateau phase. 
Masters and Johnson's description of the 
stages of arousal seems to support Wenger et 
al.’s (1956) theory of the predominance of 
parasympathetic phenomena during the early 
stages of arousal and the appearance of sym- 
pathetic phenomena during the later preor- 
gasmic phases of arousal. 

Perhaps Wenger’s theory may be applied to 
the two mechanisms postulated by Beach: 
parasympathetic activity stimulating SAM 
and sympathetic activity associated with IEM. 
Some interesting and relevant data are avail- 
able on the sexual effects of tranquilizing and 
antidepressant drugs. Blair and Simpson 
(1966) reported that tranquilizing drugs 
(Perphenazine, Trifluoperazine, Butaperazine, 
and Reserpine), which act as central sympa- 
thetic nervous system depressants, interfere 
with emission and ejaculation, often resulting 
in interference with ejaculation. The authors * 
have also reported that only 6 of 60 chronic 
patients on the drug Thioridazine (also a 
central autonomic depressant) could experi- 
ence ejaculation; and 10 normals on Tofranil 
all had some degree of retarded ejaculation. 
Many other investigators have reported in- 
hibitions of ejaculation produced by Thiorida- 
zine (Cohen, 1964). Money and Yankowitz 
(1967) found that the sympathetic inhibiting 
drug Ismelin produces ejaculation problems. 

On the other hand, Simpson, Blair, and 
Amuso (1965) found that antidepressant 
MAO-inhibitors, which have indirect sympa- 
thetic stimulation effects, and antiparasympa- 
thetic effects, can interfere with erection and 
cause impotence. When the drugs are discon- 
tinued, potency usually returns, 

The possibility that sympathetic dominance 
may inhibit arousal and facilitate ejaculation 
may explain why sexual anxiety may be ex- 
pressed in an inability to attain or maintain 
an erection, or premature ejaculation. Assum- 
ing that anxiety creates a state of heightened 
autonomic arousal these effects would follow. 


'G. M. Simpson and 


A J. Blair, personal com- 
munication, August i, 


1969. 
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Hormones 


Although a large amount of data has been 
collected on the role of hormones in sexual 
behavior in lower species (Beach, 1948, 1958, 
1965) their role in human sexual arousal is 
not well defined. One of the reasons for the 
state of affairs is that extirpation experiments 
can be done freely with animals, but human 
data must be obtained from clinical case 
studies. Another reason is that sex hormone 
determinations have only recently been im- 
proved so that their total production rate can 
þe measured. Formerly, investigators had to 
rely on unstable plasma measures or the 
metabolic end products of the hormones. 

Money (1961) has reviewed the evidence 
from clinical studies including his own exten- 
sive series of cases. Sex hormones play a 
crucial role in the growth of the genital 
structures in man, as in lower species. How- 
ever, anatomy is not necessarily destiny, 
Freud to the contrary. The gender role and 
erotic orientation of hermaphrodites, for in- 
stance, is determined by the social role as- 
signed to them after birth. While hormones 
do not seem to play an important role in 
determining the direction of sexual interests, 
they may have crucial consequences ior the 
strength of the sex drive or sexual arousability. 
Androgen stimulates growth and dilation of 
the vasculature of the penis and clitoris. The 
maintenance of erection by engorgement of 
the penis with blood is facilitated by andro- 
gen. “Tumescence of the penis can occur in 
the absence of androgen, but the erection is 
generally not complete and long lasting [p. 
1387]." “Erotic drive? in hypogonadal males 
is generally heightened by androgen adminis- 
tration. Withdrawal of androgen, oF substitu- 
tion of placebo, generally results in a loss of 
arousability. In one group of hypogonadal 
cases where androgen administration was 
stopped, the ejaculate diminished in volume 
until no fluid was emitted, and the men re- 
ported that they had fewer erections and less 
urge to masturbate or initiate heterosexual ac- 
tivity. Even reports of erotic imagery and 
daydreams were reduced. 

The ovaries in women do not seem to be 
crucial for sexual arousability. Estrogen in 
women facilitates vaginal lubrication, but lack 


of lubrication is not an insurmountable prob- 
lem in sexual intercourse. Money (1961) 
marshalls considerable clinical evidence to 
support the hypothesis that androgen is the 
hormone which is related to sexual arousa- 
bility in women as well as in men. Androgens 
in women probably originate in the adrenals. 
Many women who receive androgen therapy 
report increased sexual desire. Androgen sensi- 
tizes the clitoris and, if prolonged, may re- 
sult in hypertrophy of the clitoris. Hyper- 
adrenal pseudohermaphroditic females pro- 
duce high levels of androgen, but the effects 
on erotic behavior are variable. 

Most cases of impotence in males and 
frigidity in women are not expressions of 
hormonal insufficiencies and do not respond 
to treatment with additional exogenous hor- 
mone. Sex hormones probably operate to 
lower thresholds for sexual arousal, but beyond 
a certain level additional hormone supplies 
may not make much difference. 

Despite the importance of sex hormones 
(and probably the gonadotropic hormones) 
in sexual arousability, this writer has been 
unable to locate a single published study of 
the effects of sexual arousal on sex hormone 
levels in the human. Masters reported that 
such studies are going on in his laboratory, 
but it will be several years before these data 
will be made available. 

Because of the important role of the au- 
tonomic nervous system in sexual arousability 
the adrenal medullary hormones may play a 
role in general arousal. Several studies on the 
effect of sexual arousal on these hormones are 
discussed later in this review. 


Electrodermal Measures 


Tt is of some historical interest to note 
that Wilhelm Reich (1937) experimented with 
skin potential measurement in an attempt to 
provide empirical evidence for a theory of 
the electrical nature of sexual excitement. He 
applied electrodes to various body parts in- 
cluding the penis, vaginal mucosa, tongue, lips, 
anal mucosa, nipple, palm of the hand, ear- 
lobe, and forehead. He claimed that erogenous 
zones (all of the aforementioned were con- 
sidered erogenous) have a much higher po- 
tential than nonerogenous zones of the body 
and that sensations of pleasure are associated 
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with rises in potential while sensations of 
displeasure are associated with falls in po- 
tential. His crude apparatus and unusual ex- 
perimental techniques, such as using the elec- 
trode itself to stimulate the site, make his 
data questionable. Little need be said about 
the somewhat grandiose theory extrapolated 
from the data. 

Davis and Buchwald (1957) used electro- 
dermal and other autonomic measures of re- 
Sponse to see if different types of pictures 
produce different kinds of somatic response, 
Twelve pictures were projected on lantern 
slides. There were two pictures for each of 
Six categories: cartoons, landscapes, female 
nudes, horror, (eg, photograph of a starving 
man), fear, (e.g., photograph of alligator 
head), and geometrical abstractions, Each 
picture was presented for 1 minute followed 
by a 1-minute rest period. Three electrodermal 
measurements were made from palmar elec- 
trodes: (a) maximum skin resistance decrease 
in the first 10 seconds of picture presentation 
expressed in percentage of base level; (5) skin 
resistance change from the beginning to the 
end of each stimulus presentation expressed 
as percentage change; (c) number of galvanic 
skin responses (GSRs) during the stimulus, 
The 12 stimuli were ranked in order of the 
mean magnitudes of response for each of the 
11 physiological measures used including the 3 
electrodermal measures, The ranks of the 
mean responses to the 12 stimuli for each 
measure were correlated with sums of the 
ranks on all measures to Provide a measure 
of response generality. The ranks of pairs of 
stimuli (JV = 6) were also Correlated to see 
if the responses to classes of stimuli differ in 
a reliable fashion. For the males there was 
significant concordance among response yari- 
ables, for females there was not, Similarly, 
for males, there was significant group relia- 
bility of combined responses to the pairs of 
pictures, for females there was not, For both 
sexes the two pictures of nudes elicited greater 
combined response than the other pictures, 
For males this was also true for the initia] 
galvanic skin response, and the net skin re- 
sistance change where Ranks 1 and 2 were 
assigned to the nudes. Both of these measures 
also correlated highly and Significantly with 
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the combined ranks and yielded significant 
correlations between Pairs of pictures. The 
number of GSRs did not yield significant cor- 
relations with variables or across pairs of 
pictures. While the method of data analysis 
in this paper leaves something to be desired 
(individual data, even 
examined), 
ential autonomic responses do occur to cate- 
gories of pictures, that female nudes have a 
highly stimulating value, and that GSR and 
skin resistance seem to offer a reli 


Loisselle and Mollenauer (1965) used male 
stimuli. Within each 
Set there were three clothed, three seminude, 
and three nude figures. The subjects were 20 
These women Showed sig. 
nificantly greater GSRs to nude than to 


to nude female figures. The 
authors were careful to say that the GSR may 
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of the fact that there were large differences 
in Semantic Differential ratings of the two 
types of stimuli by another group of males. 

Martin (1964) added another dimension to 
the standard experiment comparing reactions 
to nudes and landscapes. In half of his sub- 
jects the presentation of pictures was pre- 
ceded by a permissive set, and in the other 
subjects by an inhibitory set. The measure of 
response was the change in skin conductance 
from the beginning to the end of an entire 
series of pictures. One group was shown six 
pictures of Playboy-type nudes interspersed 
with six pictures of landscapes. Another group 
was shown 12 pictures of landscapes. 

In the first experiment, both picture type 
and the interaction between set and picture 
type were significant. There was less drop in 
skin conductance when nudes were being 
Shown, and this difference between stimuli 
was more marked after an inhibitory set. In 
the second experiment, only the set variable 
was significant. This experiment underlines 
the importance of the social setting in which 
an experiment is performed. 

Autonomic variables are not immune from 
this type of effect. For instance, Zuckerman, 
Persky, and Link (1969) showed that breath- 
ing and electrodermal responses to sensory 
deprivation may depend on the set given to 
subjects in the instructions and by the total 
experimental setting. 

Speisman, Lazarus, Davison, and Mordkoff 
(1964) attempted to separate the effects of 
male nudity from those of mutilation in the 
film depicting Australoid Aborigine circum- 
cision rites. The film was analyzed by dividing 
it into three sections: neutral, nudity, and 
mutilation. The subjects were 12 male and 
12 female undergraduates. All subjects viewed 
all three sections of the film in a counter- 
balanced order. Skin conductance and heart 
rate were sampled at 10-second intervals dur- 
ing the film periods and for 2.5 minute base- 
line periods. Analysis of covariance was used 
to eliminate the influence of base-line levels. 
Mutilation scenes elicited greater skin con- 
ductance and heart rate increases than neutral 
scenes; nude scenes did not elicit responses 
different from those to neutral scenes on 
either measure. The predominant mood, elic- 
ited by both mutilation and nudity sections 
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of film relative to the neutral section, was 
tension. The fact that nude Australoid Ab- 
origines elicit little sexual arousal in young 
female or male undergraduates is not sur- 
prising. 

Koegler and Kline (1965) used the aborig- 
ine circumcision rite film along with two 
sexual films and three neutral films to com- 
pare the responses of 20 male medical stu- 
dents, 10 male undergraduates, and 20 female 
undergraduates. The film intended to induce 
heterosexual arousal in males showed nude 
and seminude women in striptease sequences. 
Another film showed two seminude males in 
wrestling and massage sequences and was 
aimed at homosexual arousal. They used mea- 
sures of palmar skin resistance, GSR lability 
(number of fluctuations), heart rate, finger 
pulse volume, and respiration. Unfortunately 
their data were not presented in a form that 
would make comparison on the separate au- 
tonomic measures between films and subject 
groups easy to evaluate. The following dis- 
cussion of their results is merely taken from 
the text of the article. The subincision film re- 
sulted in significant changes in GSR level and 
lability and heart rate, particularly in the male 
and female college students. Little effect was 
seen on blood volume and respiration measures. 
The medical students showed less subjective 
and autonomic reaction to this film, viewing it 
more intellectually and less empathically. 
However, all males enjoyed the heterosexual 
movie and found it exciting. They showed 
autonomic reactions of a magnitude compar- 
able to those of the girls to the subincision 
movie. The authors felt that arousal in posi- 
tive and negative affective states is not dis- 
tinguishable: “Thus it seems that the direc- 
tion of autonomic change is independent of 
the nature of the psychologic stress [p. 274].” 
However they noted that the males showed 
more autonomic reaction to the heterosexual 
than to the homosexual movie, and they found 
the latter unpleasant. Three homosexuals 
showed a stronger reaction to the homosexual 
movie than to the heterosexual movie. The 
results suggest that intensity of nonspecific 
autonomic reactions might be used to differ- 
entiate sexual arousal to preferred and non- 
preferred sexual objects, but cannot be used 
to distinguish between positive and negative 


affective arousal. However, the stimuli and the 
affective reactions are confounded. Homo- 
sexual and heterosexual movies are likely to 
stimulate both positive and negative arousal, 
depending on the type of subject and the 
style of the movie itself. 

Romano (1969) compared the electrodermal 

responses of 39 married male subjects to (a) 
control stimuli consisting of neutral slides, 
(b) an erotic motion picture depicting a 
couple engaged in coitus, (c) a film showing 
scenes from World War II Nazi concentra- 
tion camps (gas chambers, corpses). The 
sexual and atrocity films were presented in a 
counterbalanced order with half of the sub- 
jects watching one film first, while the others 
viewed the other film first. Both of the films 
resulted in significant increases in spontaneous 
GSR activity relative to the control stimuli. 
Although the change was greater to the sexual 
film, the difference in responses to the two 
films was not significant even though the two 
films were rated differently on an affect check- 
list. Neither film produced significant changes 
in basal skin resistance. The results indicate a 
lack of affect response specificity. Although 
the subjects reported that the concentration 
camp film aroused unpleasant affect, and the 
sexual film aroused pleasant affect, the spon- 
taneous GSR activity did not differentiate 
these affective reactions. 

Roessler è examined the responses of male 
subjects to a sexual movie and a control movie 
(a piano recital). His subjects showed greater 
skin conductance increase in response to the 
erotic film than to the piano recital although 
the difference only approached significance, 
Data from an affect checklist test indicated 
that the erotic film also elicited more anxiety 
than the control film. 

Fisher and Osofsky (1968) recorded skin 
resistance and spontaneous GSRs in 42 mar- 
ried women during three sessions: a control 
session while they were fully clothed and 
lying quietly; a second session during which 
a male gynecologist examined electrodes 
placed on the breast and labia and made 


€ Roessler, R., and Collins, F, Physiological re- 
sponses to sexually arousing motion pictures. Paper 
delivered at Ninth Annual meeting of the Society 
for Psychophysiological Research, Monterey, Cali- 
fornia, October 1969. 
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“touch threshold” determinations; and a third 
session in which a standard gynecological ex- 
amination was made. It is possible, but not 
probable judging from most womens? reports, 
that the gynecological examinations produced 
sexual arousal. At any rate, no self-ratings 
were made by the subjects on their state of 
arousal during the examinations. Whatever 
the cause, anxiety or sexual arousal, the ex- 
aminations produced greater GSR frequency 
and amplitudes, and lower palmar and leg 
skin resistance than the control session. The 
authors also measured skin resistance of the 
labia and breast in Sessions 2 and 3. While 
there was no base line to assess these com- 
parisons against, it is interesting that skin 
resistance on the labia was lower than on the 
palm, while skin resistance on the breast was 
higher. These authors were the first since 
Reich to make electrical recordings in these 
erogenous areas of skin. 

Another method of stimulating sexual 
arousal is to visually present erotic passages 
in printed form. This method demands more 
imagination from the subject. However, in 
view of the fact that women report less arousal 
than men in response to pictures and movies, 
but report comparable arousal in response to 
erotic passages in literature (Kinsey et al., 
1953), this method of presentation may be 
useful in making comparisons of the sexes. 
Jordan and Butler (1967) used four passages 
from fiction presented on cards: two descrip- 
tion of sexual seduction and two neutral 
scenes, The subjects were 32 females: 16 high 
on the hysteria scale of the MMPI and 16 
low. The experimenter was female. Sex and 
neutral themes were alternately presented 
with a 45-second pause between them. Sig- 
nificant differences in Skin resistance change 
were found for diagnoses (high versus low 
hysteria scores), passages (sex versus neutral), 
and the interaction, Sexual material elicited 
more response in skin resistance than neutral 
material. High hysteria scorers showed more 
response than low scorers to the sexual pas- 
ages, but these groups did not differ in re- 
sponse to the neutra] passages, 

Wenger, Averill, and Smith (1968) studied 
= Beo of 16 male subjects while read- 

Y and innocuous passages presented 
On slides. Palmar Skin conductance was 1 
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of 10 autonomic functions measured; it proved 
to be the most sensitive one, yielding highly 
significant differences between the erotic and 
control passages. Number of GSRs during 
presentation of the passages was also tabu- 
lated, but no results on this variable were 
reported. 

Physiological indexes have been increasingly 
empioyed in the study and treatment of sexual 
deviants. Solyom and Beck (1967) used three 
fetishists and one homosexual as subjects in 
a study of electrodermal reactions to pictures 
of the fetish objects (a seminude male in the 
case of the homosexual patient). Neutral geo- 


* metrical designs and seminude female figures 


were also used as stimuli. Each picture was 
presented on a slide for a period of 1 min- 
ute. The electrodermal indexes of reaction 
included fall in skin resistance during 
the first GSR (percentage of prestimulus 
level), latency of first GSR, latency of maxi- 


mum fall in skin resistance, recovery time to 


regain prestimulus level, change in skin re- 
sistance of 1-minute intervals (expressed as 
percentage of base line), and number of GSRs 
over each of the 1-minute intervals. Of the 
six electrodermal measures used, only ampli- 
tude of first GSR, recovery time, and change 
in basal skin resistance showed much variation 
from picture to picture. Analysis of variance 
was used to compare the effects of stimuli 
and trials (each type of stimulus was pre- 
sented four times). Significant differences were 
obtained between stimuli: the fetish (semi- 
nude male in the case of the homosexual) 
object and the seminude female elicited a 
greater GSR amplitude than the neutral ob- 
ject, but did not differ significantly from each 
Other. A somewhat weaker trials effect was 
also found, with habituation from the first to 
the fourth trial. 

Steffy? has been attempting to develop a 
Sexual Attractiveness Scale (SAS), which 
measures physiological (GSR) arousal and 
verbal report of preference to a range of 

© visual sexual stimuli, The stimuli consist of 
pictures in 18 categories representing the 
combinations of the following 3 major cate- 


7 Steffy, R. A. Progress report on the treatment 
of the pedophile offender. Paper delivered at 
Fourth Annual Conference on. Addictions and Sexual 
Deviation, Mimico, Ontario, April 1967. 
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gories: sex (male, female), age (child, pubes- 
cent, adult), and state of dress (dress, par- 
tially dressed, and nude). Five pictures repre- 
sent each of the categories and 10 geometrical 
forms are added as neutral items. The subjects 
observe each picture for 10 seconds, at 
which time GSR is recorded from finger elec- 
trodes. At the end of each 10-second period, 
the subject rates the attractiveness of the 
picture on a 10-point scale. Mean conduct- 
ance-change data are presented on some male 
subjects. The normal control group showed a 
major interest in adult female pictures in all 
stages of dress as reflected in both ratings 
and conductance change, There was only a 
slight indication of a gradient going from 
dressed to partly dressed to undressed. Hetero- 
sexual prisoners showed a pattern of response 
similar to normal males. The homosexual 
prisoners showed somewhat more response to 
nude adult and pubescent males than the 
heterosexual prisoners, but their greatest re- 
sponse was still to nude adult females, Hetero- 
sexual pedophiles rated all females higher 
than all males, and their GSRs showed greater 
response to females than to males; however, 
only the GSR to the nude and dressed female 
children was significantly different from the 
response to male figures. Homosexual pedo- 
philes expressed greater interest in both pu- 
bescent males and adult females than in other 
categories, and the GSR reflected this interest. 
Both homosexual groups in the experiment 
expressed interest in both males and females, 
but the authors cautioned that the GSR may 
be measuring negative, as well as positive af- 
fective responses. The psychophysiological 
technique developed by the author is a promis- 
ing one; but the sampling of stimuli is not 
balanced by a sampling of autonomic re- 
sponses. Better discrimination of sexual types 
has been achieved using penile erection mea- 
surements; these techniques are discussed in 
a later section. 

Barlow, Leitenberg, and Agras (1969) have 
used an electrodermal method to measure 
changes in response to imagined scenes during 
the course of behavioral therapy. They re- 
ported on two subjects—a male pedophiliac 
and a male homosexual. The therapy consisted 
of an association of the fantasy arousal stimuli 
with a covert noxious stimulus (imagining 
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nausea and vomiting). Electrodermal arousal 
was measured as change in log conductance 
over each of the four experimental periods: 
base line, acquisition, extinction, and reac- 
quisition. During acquisition, fantasied sexual 
stimuli were associated with fantasied nausea 
and vomiting. In the pedophiliac, there was a 
dramatic drop in conductance during acquisi- 
tion, a rise during extinction, and a drop 
during reacquisition. In the homosexual, con- 
ductance declined during acquisition, but did 
not recover during extinction even though re- 
ports of subjective homosexual arousal in- 
creased during that period. 

Galvanic skin reactions to pictures of nude 
females are quite pronounced in most studies, 
exceeding responses to clothed figures and 
neutral forms. Similarly, electrodermal reac- 
tions seem to be sensitive measures of re- 
Sponse to the reading of erotic material, 
fantasying of erotic Scenes, and the viewing of 
movies. Attempts to show differential GSRs 
to male and female sexual stimuli have not 
generally been successful. Female nudes seem 
to have a great arousal effect for heterosexual, 
homosexual, and fetishist males. Several au- 
thors have cautioned against interpreting elec- 
trodermal responses as Specific measures of 
sexual arousal since they are known to be 
equally responsive in negative affective reac- 
tions. This caution is well taken. Tt is con- 
ceivable that a group of males might show 
equal electrodermal Tesponsivity to male and 
female nudes because the former might elicit 
anxiety or surprise while the latter might 
elicit sexual arousal. From the electrodermal 
responses alone, there might be no way to 
differentiate these reactions, 


Cardiovascular Changes 


Cardiovascular responses during sexual 
arousal may be divided into two kinds: loca] 
vasocongestion in the primary and secondary 
erogenous zones and more general reactions, 
such as increased heart rate and blood pres- 
sure. The vasocongestion is the cause of tu- 
mescence of the penis, and the clitoral glans, 
increase in diameter of the clitoral shaft, in- 
crease in breast size, and other changes oc- 
curring in the labia, vagina, and uterus 
(Masters & Johnson, 1966). ‘The initial sexual 
excitement is expressed in a dilation of blood 
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vessels carrying blood to the primary erog- 
enous zones of the body, and probably in a 
vasoconstriction of vessels leading away from 
these zones. Pronounced tachycardia and ele- 
vations in blood pressure are said to be more 
characteristic of the plateau phase of arousal, 
reaching a peak in the orgasmic phase of 
arousal. Masters and Johnson (1966) re- 
ported recorded heart rates in males and fe- 
males averaging from 100 to 175 beats per 
minute (bpm) during the plateau phase and 
110 to 180+ bpm during orgasm. Systolic 
blood pressure elevations of 20-60 millimeters 
in the orgasmic phase are reported in the 
female; and 20-80 millimeters in the plateau 
phase, and 40-100 millimeters in the orgasmic 
phase are reported for the male. Masters and 
Johnson claimed that heart rate and blood 
pressure elevations in the excitement phase in- 
crease "in direct parallel to rising. tension," 
but it is not clear if these reactions would be 
useful in measuring responses to visual stimuli. 
Presumably, the authors are referring to in- 
creases occurring with genital manipulation, 
Masters and Johnson (1966) presented |. 
sample electrocardiogram recordings of one _ 
male and one female during manipulation, 
orgasm, and postorgasm resolution. Tt is ap- 
parent that marked increase in heart rate oc. 
curs within about half a minute after the 
Start of manipulation and reaches a peak 
during orgasm. Clearer recordings of rate 
are presented by Bartlett (1956). This author 
recorded heart and breathing response in 
three couples during foreplay, coitus, and 
orgasm. Each subject signaled intromission, 
orgasm, and withdrawal by Pressing a button, 
Marked heart rate fluctuation is seen in both 
sexes during foreplay prior 
but the constant 


hyperventilation 
sion. Heart rates approaching 170 bpm were 
recorded during orgasm. A marked paral- 
lelism of heart tates of the Sexual partners 
Was seen after intromission although orgasms 
were not simultaneous, sharp ae in 
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nipulation, which involves much less muscular 
activity. 

Davis and Buchwald (1957) measured vol- 
ume pulse amplitude from the finger, pulse 
cycle time, pressure pulse amplitude, bone 
cardiogram amplitude, finger volume, and chin 
volume in their study of responses to pictures 
(described in the previous section on electro- 
dermal measures). Volume pulse, pressure 
pulse, and pulse time showed reliability for 
responses to pairs of pictures of similar con- 
tent. Only volume pulse and pulse time showed 
a correlation with combined ranks for all 
pictures (a measure of generality of arousal). 
Only pressure pulse showed the high arousal 
to both sexual stimuli found in the GSR 
variables. 

Wenger et al. (1968) used systolic and 
diastolic blood pressure, heart rate, and finger 
pulse volume in their study of reactions of 
visually presented erotic prose (described in 
prior section). Only systolic and diastolic 
blood pressure significantly differentiated re- 
sponses to erotic and control slides. Maximal 
mean rises of about 4 millimeters of mercury 
were found for systolic blood pressure, and 
about 5 millimeters of mercury for diastolic 
blood pressure. Mean heart rate showed no 
change in response to erotic or control slides. 
Finger pulse volume showed a slight biphasic 
response, first decreasing then increasing; 
however, the change was not significant. While 
the mean changes in blood pressure were sig- 
nificant, they were minimal compared to the 
changes reported by Masters and Johnson 
during manipulation. 

Heart rate changes to visual or auditory in- 
put may be biphasic (Lacey, Kagan, Lacey, & 

Moss, 1963) first showing deceleration, then 
acceleration in the return to base line. Pro- 
cedures which use average heart rate may 
average out significant responses if the bi- 
phasic nature of the heart rate is not con- 
sidered. 

Wood. and Obrist (1968) used Playboy 
nude pictures as unconditioned stimuli in a 
conditioning task. A red light was used as a 
conditioned stimulus which preceded the nudes 
by 7 seconds. Heart rate was recorded for 
each second subsequent to the conditioned 
stimulus and following the unconditioned 
stimulus, which was presented for 8 seconds. 
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Our main interest is in reinforced trials where 
the nudes were actually presented. In the 
first experiment with the nude unconditioned 
stimuli, no significant trends were found on 
reinforced trials. In postexperimental reports 
some subjects reported the nudes to be mo- 
notonous or tedious (Hugh Hefner take 
note!). A second experiment was designed to 
increase the subjects! motivation to look more 
carefully at the nudes by offering money as 
a reward for answering a postexperimental 
questionnaire regarding pictures. Using this 
procedure, a significant effect was found on 
reinforced trials. A deceleration was found in 
the second prior to the stimulus presentation 
and during the first second of the uncondi- 
tioned stimulus presentation. In the next 3 
seconds, an acceleration was seen, followed 
by another deceleration in the last 3 seconds. 
The data of these authors suggest that (a) 
Playboy nudes are not very arousing stimuli 
for North Carolina undergraduate males; (b) 
even when subjects were motivated to attend 
to the nudes, the heart rate response is a 
very brief biphasic one; (c) if one wishes to 
use heart rate as a measure of sexual re- 
sponse to pictures, heart rate must be re- 
corded beat by beat using a cardiotachometer. 

Bernick, Kling, and Borowitz (in press) 
measured heart rate in nine male subjects dur- 
ing presentation of neutral slides, a hetero- 
sexual “stag” film, a homosexual “stag” film, 
and an Alfred Hitchcock suspense film. Mean 
heart rates were calculated for the 4-minute 
slide periods and the 16 minutes of each film. 
Using the heart rate during the slides as a 
base line, the mean changes were +7 bpm dur- 
ing the heterosexual film, --6 bpm during the 
homosexual film, and +4 bpm during the sus- 
pense film. The increases were significant for 
the heterosexual and suspense films but not 
significant for the homosexual film; none of 
the differences between films was significant. 
Subjects reported the percentage of viewing 
times that they had an erection. Seven of 
the eight subjects reported more erection dur- 
ing the heterosexual than the homosexual 
movie. While such self-reports may be of 
dubious reliability, it is interesting that there 
was no significant relationship between re- 
ports of erection and increases in heart rate, 
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and the correlation was actually negative in 
the heterosexual film session. 

Corman (1968) examined the effect of 
slides of Playboy nudes and an erotic motion 
picture on eight autonomic variables includ- 
ing heart rate. The subjects were 10 young 
married men. Reactions were measured in re- 
sponse to a sound stimulus, 50 control slides 
(Expo '67), 15 Playboy nude slides, and an 
erotic motion picture. The materials were 
presented in the sequence listed. The motion 
picture was made for the experiment and 
portrayed two people making love in bed, 
without explicit views of genitalia. Soft music 
was played with the movie. These details of 
the movie presentation are interesting in view 
of the positive results obtained. Apparently 
“hard-core” pornography is not necessary to 
elicit arousal. The physiological data were 
examined for the sound stimulus response and 
at standard points in the slide presentations 
and during crucial scenes in the motion pic- 
ture. Heart rate did not increase significantly 
going from the control slides to the Playboy 
slides. However, the erotic movie significantly 
increased heart rate by an average of 5 bpm 
during the most arousing scene. Heart rate 
was significantly higher during the movie 
scenes than during the control slides, the 
nude slides, and the noise stimulus. Both 
systolic and diastolic blood pressures showed 
significant increases of about 5 millimeters 
going from control slides to slides of nudes, 
and further significant increases of 11 milli- 
meters (systolic) and 6 millimeters (diastolic) 
going from the nudes to the peak scenes of 
the movie. The response to the movie was 
significantly greater than the response to the 
slides of nudes and the noise stimulus. 

Romano (1969) used the same erotic mo- 
tion picture used by Corman, but Romano’s 
subjects also viewed a concentration camp 
movie which aroused negative affect. The de- 
tails of this experiment were described in the 
prior section on electrodermal measures. As 
in Corman’s experiment, the erotic movie re- 
sulted in significant increases in both systolic 
and diastolic blood pressure. The negative- 
affect arousing film also resulted in significant 
increases in blood pressure, and the differences 
between the films in the blood pressure effects 
were not significant. The changes in heart rate 
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and heart rate variability produced by either 
film were not significant, disconfirming the 
heart rate effects of the erotic movie found 
by Corman. 

Roessler (see Footnote 6) found signifi- 
cantly greater increases in pulse amplitude 
and heart rate in response to an erotic film 
than to a control film, 

Fisher and Osofsky (1968) 
ment comparing reactions in a control ses- 
sion and after gynecological exams, found 
that the latter produced significant heart rate 
increases. As was mentioned in the GSR sec- 
tion, it is not clear whether this autonomic 
arousal was produced by anxiety, sexual 
arousal, or both. 

The research using heart rate as a measure 
of sexual arousal indicates that this measure 
is not very sensitive to sexual arousal prior 
to intromission or genital manipulation. 
Erotic motion pictures may stimulate a small 
increase in heart rate, but even these effects 
are not consistent across experiments. Part 
of the problem may lie in the biphasic nature 
of the heart rate response to exteroceptive 
stimuli, but in the one study where rates were 
taken every second the amount of acceleration 
was small and briefly sustained. Even with 
the highly erotic stimulus of a stag m 
heart rate changes were minimal in most 
jects and, where 
sociated with a 
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action for both sexes with peaks as high as 
40 per minute for both sexes. 

Bartlett (1956) measured respiratory rate, 
minute volume, and tidal volume in couples 
during coitus. *A simple mouthpiece was 
valved so that atmospheric air was inhaled, 
and the expired air was passed through a dry- 
gas test meter. Respiratory volumes were ob- 
tained by reading the volume of expired air 
at j-minute intervals. Respiratory rate was 
recorded on a smoked drum by a tambour 
which was attached to the exhalation side of 
the mouthpiece to record pressure changes. 
From a knowledge of the minute volume and 
rate, the tidal volumes were calculated. The 
nose was lightly clamped to prevent an error 
in the measurement of the expired air [p. 
469]." One must admire the heroic perform- 
ance of Bartlett's subjects under these condi- 
tions of recording. The results on the three 
respiratory measures were similar to those 
for heart rate: (a) fluctuations before in- 
tromission but no accelerating trend until 
after intromission, (5) marked peaks at or- 
gasm with rates of 20 to 70 reached during 
orgasm, (c) a parallelism of male and female 
rates. The authors speculated that this ex- 
treme hyperventilation at orgasm could ac- 
count for the partial lapse of consciousness in 
some persons at this time. 

Returning to the more gentle arousal in- 
duced by visual stimuli, we find breathing 
measures considerably less responsive. Davis 
and Buchwald (1957) measured breathing 
cycle time (duration of first breathing cycle 
in the stimulus interval) and maximum 
breathing amplitude of inspiration or expira- 
tion during the stimulus interval. Breathing 
cycle amplitude was reliable across pairs of 
pictures of similar content; breathing cycle 
time was not reliable. One of the sexual 
stimuli elicited a strong response on both 
measures, the other did not. 

Wenger et al. (1968) found no differences 
between respiration rates recorded while read- 
ing erotic and control passages. Hain and Lin- 
ton (see Footnote 5) measured depth of 
respiration. and the inhalation rates (time 
taken for largest depth of respiration divided 
by the time taken from the complete inhala- 
tion-exhalation cycle) in response to pictures 
of male and female nudes. Neither measure 
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yielded a difference between male and female 
nudes. The /-íraction showed no habituation, 
but the depth of respiration measure dimin- 
ished with repeated stimulus presentations. 
Koegler and Kline (1965) reported that res- 
piration changes were rarely seen in response 
to erotic or stressful movies. Corman (1968) 
whose experiment was described in the pre- 
ceding section, found no significant changes in 
respiratory rate or variability produced by 
slides of nudes or an erotic movie. When 
base-line respiratory measures were compared 
to the scene designated by each subject as 
the most subjectively arousing, significant de- 
creases in rate and increases in respiratory 
variance were found. Romano (1969) also 
found no effects of this erotic film on respira- 
tion rate or variability. The only positive re- 
sult reported for respiration rate was in the 
study by Roessler (see Footnote 6) who found 
a greater increase in response to his erotic 
film than to his control film. Apparently 
respiratory measures will not be useful in as- 
sessing sexual arousal to visual stimuli. 


Penile Erection Measures 


Masters and Johnson (1966) have sum- 
marized what is known about the anatomy 
and physiology of the penis. Running through 
the body of the penis are three cylindrical 
bodies of erectile tissue. Two of these cylin- 
ders, the corpora cavernosa, lie parallel to 
each other, and a third, the corpus spongiosem, 
runs along the ventral portion of the penis 
and contains the urethra. Stimulation of the 
splanchic nerves dilates the penile arteries, 
blood flows into arterioles in the corpora 
cavernosa and fills the sinuses. A center for 
reflex erection is said to exist in the sacral 
section of the spinal cord. (In Hohmann's 
study of the effects of spinal cord lesions, four 
of five cases with damage in this area were 
incapable of reflexive erections.) Of course, 
stimulation of erection may also be directed 
from higher cortical centers. *Erection is lost 
when the sympathetic nerve supply causes 
constriction of the penile arteries [p. 179]." 
Active constriction of the arterioles allows the 
trapped blood to escape from the cavernous 
sinuses through the penile veins. Thus, penile 
erection is a function of a localized vasodila- 
tion stimulated from spinal or higher neural 
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centers. Sympathetic innervation results in a 
vasoconstriction which may inhibit erection 
or cause detumescence following erection. This 
description fits Wenger's theory of the pre- 
dominance of parasympathetic activity in the 
initial phase of sexual arousal. However, the 
apparent sympathetic dominance during the 
plateau and orgasmic phases leading to ejacu- 
lation does not seem to interfere with erec- 
tion. The relative roles of sympathetic and 
parasympathetic systems is obviously complex 
and some local autonomy of autonomic re- 
sponse is apparent. Sensory distractions, such 
as loud sudden noises, stimulating the central 
nervous System, may also impair penile erec- 
tion. 

Masters and Johnson have pointed out that 
penile erection may occur in states other than 
sexual excitement. One of these states, the 
rapid eye movement (REM) arousal stage of 
sleep, is discussed later in this section. How- 
ever, in the presence of sexual stimulation, 
penile erection would seem to have some “face 
validity? as a specific measure of sexual 
arousal in the male. (It is conceivable that a 
male may be psychically aroused, but penile 
erection may be inhibited by anxiety-related 
sympathetic system reaction. This is one rea- 
son why the term “arousal” must be qualified 
by operational definition.) Several investi- 
gators have developed devices for measuring 
penile erection, and these are discussed in the 
following paragraphs. 

Freund, Sedlacek, and Knob (1965) have 
described a transducer for mechanical plethys- 
mography of the male genital. The penis is 
inserted through a flat, soft sponge-rubber 
ring and an elastic rubber tube, made from a 
condom, into a glass cylinder. The cylinder 
tapers down at the end to a narrow funnel 
which connects by tube with the volumetric 
instrument. The sponge-rubber ring which 
acts as a pad for the cylinder is fitted on the 
penis. The glass cylinder is attached to the 
body of the subject with straps. The elastic 
cuff is inflated with air to fill up the broad 
end of the cylinder to make its base airtight. 
The supply of air to the cuff is shut off, and 
the funnel of the cylinder is connected by 
tube to the volumetric device, Freund et al.’s 

study provides a diagram of the device and 
instructions for its construction. 
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Freund has applied his device to the diag- 
nosis of various sexual deviancies including 
homosexuality and pedophilia. The technique 
consists of exposing subjects to pictures of 
nude males and females in five age cate- 
gories ranging from children to adults. 

Penile volume recordings are made to 
measure reactions to pictures in each of the 
categories. In the first study (Freund, 1963) 
the method was applied using 58 homosexuals 
and 65 heterosexuals. These groups were fur- 
ther subdivided into those who preferred 
adults, adolescents, or children as sex objects. 
Each picture was exposed for 13 seconds with 
an interval of 19 seconds between pictures. 
If the tracing of penile volume was still fall- 
ing or rising, or if it was still some distance 
from its original level, the presentation of the 
next stimulus was postponed. Summed reac- 
tions in each category of pictures correctly 
diagnosed 48 of the 58 homosexuals and all 
65 of the heterosexuals. Of the 10 misdiag- 
noses of homosexuals 6 occurred in records 
where responses to the stimuli were flat, or 
almost flat. There was also significant agree- 
ment among the age preferences in sexual ob- 
jects and reactions to adult and child figures 
of the appropriate sexes; that is, subjects who 
preferred adolescents or children showed great 
penile volume responses to younger age figures 
than to older age figures. A retest of 86 cases 
showed high consistency of the size of volume 
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heterosexual controls who were alcoholics. Al- 
coholics are not a good control group for 
studies of sexual arousal. The cirrhosis in 
chronic alcoholics may cause difficulties in 
the metabolism of estrogen, resulting in high 
plasma levels and testicular atrophy (Koren- 
man, Perrin, & McCallum). In the new 
modification of the techniques, 20 male and 30 
female pictures are used and divided into 
three age groupings. Each picture is exposed 
for 7 seconds, and measurement of volume is 
taken at the beginning and end of the ex- 
posure, and at a third time 7 seconds after 
the slide is turned off. New slides are not 
presented until reaction has subsided. Diag- 
nosis is made using proportions of 10 slides 
of highest response which are either male or 
female or which fall into the adult or child 
groups. In all 40 subjects in this experiment, 
the proportion of female to male pictures 
stimulating high reaction exceeded 2 to 1. 
None of 35 homosexual males reached this 
proportion. When the two groups of this 
study, pedophiliacs and normal heterosexuals, 
were compared for age of high-reaction pic- 
tures there was little overlap or misdiagnosis. 
Freund (1965, 1967) has extended his studies 
to more groups of pedophiliacs and homo- 
sexuals with results showing good differentia- 
tion of age and sex preferences in sexual ob- 


jects. 
The successful simulation of some of 
Freund’s subjects raises the question of 


whether penile erection or inhibition of erec- 
tion can be voluntarily controlled. Laws and 
Rubin (1969) tested the ability of normal 
subjects to inhibit erection while watching an 
erotic film and to produce erection without 
an external stimulus. Four subjects who de- 
veloped full penile erections when watching 
erotic films were able to inhibit erection when 
instructed to do so. Subjects differed in their 
ability to inhibit erection, but when erection 
occurred during “inhibit” instructions it was 
always of less magnitude and longer latency 
to peak than in the “no inhibit” conditions. 


8 Korenman, S. G., Perrin, L. E., and McCallum, T. 
Estradiol in human plasma: demonstration of ele- 
vated levels in gynemastia and in cirrhosis. Paper 
pfesented at the 61st annual meeting of the American 
Society for Clinical Investigation, Atlantic City, May 
1969. 
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All subjects reported that they inhibited their 
erections by concentrating on irrelevant, non- 
sexual mental tasks, for instance, doing multi- 
plication tables. Instructions to produce erec- 
tions in the absence of stimulation resulted in 
weak erections of short duration. No subject 
was able to maintain any level of partial 
erection for more than a few minutes. Sub- 
jects induced erections by concentrating on 
sexual thoughts. The results of these experi- 
ments show that penile erection is under 
some voluntary control which subjects exert 
indirectly through their mental activity rather 
than directly through muscle control. While 
distraction is very successful in inhibiting 
erection, fantasy in some subjects does not 
seem to produce the magnitude of erection in 
any subject comparable to that produced by 
visual erotic stimuli. It would be interesting to 
see how much inhibition or facilitation of 
erection could be accomplished by feedback 
of erection information combined with operant 
reinforcement. The results of such studies 
might be useful in the treatment of sexual 
deviates. The efforts in this area have so far 
used classical conditioning methods. 

McConaghy (1967) has also described a 
simple penile plethysmograph device. The 
end of a finger stall was cut off, and the cut 
end was stretched over the open end of a 
cylindrical tin. A nipple was soldered into 
the closed end and connected by a plastic 
tube to a Grass pressure transducer. The penis 
was inserted through the open end of the 
finger stall which maintains an airtight con- 
nection. The author used movies of singly 
appearing nudes, 10 males and 10 females, 
engaged in nonsexual activity. These stimuli 
were shown at l-minute intervals and in- 
corporated into a travelogue-type of movie. 
The subjects consisted of 22 male homo- 
sexuals referred for aversion-type behavioral 
therapy and 11 heterosexual medical students. 
Of 19 homosexuals 14 showed a greater re- 
sponse to male nudes than to female nudes, 
and 10 of 11 heterosexuals showed a greater 
response to female nudes. 

Bancroft, Jones, and Pullan (1966) de- 
scribed a simple transducer for measuring 
penile erection. The device consisted of a 
strain gauge made of 18 centimeters of sili- 
cone rubber tubing filled with mercury and 
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fitted with platinum electrodes. Two transis- 
tors were mounted in a brass block. The cir- 
cuit was powered by two 1.5-volt batteries 
which were kept outside of the box. (A circuit 
diagram was provided in the article.) The 
Strain gauge was fitted around the penis and 
changes in circumference of the penis were 
registered on a 50 microammeter as changes 
in current. The initial tension could be stan- 
dardized by tightening the gauge and setting 
the reading to zero. The strain gauge could 
easily be put into position by the subject and 
worn under normal clothes. The authors re- 
ported that movement artifact was not a 
problem with the subjects comfortably seated. 
With the gain setting used, a change of 2 
microamperes is equivalent to an alteration of 
-63 millimeters in circumference of the penis. 
The authors reported an average increase of 
about 25 millimeters in full erection. A case 
was described where the device was used to 
measure changes in arousal during aversive 
conditioning in behavioral therapy of a pedo- 
philiac. The use of the device in the electric 
aversion treatment of sexual deviants was de- 
scribed in two articles (Bancroft & Marks, 
1968; Marks & Gelder, 1967), 

Barlow? and his colleagues described a 
strain gauge for measuring penile volume 
changes. They claimed their device was less 
cumbersome and restrictive than Freund’s, 
The authors stated that in Bancroft’s device 
the mercury tended to Separate at the upper 
range of volume displacement, They also 
claimed Bancroft’s device was temperature 
sensitive and somewhat difficult to build and 
seal. Their own device consisted of a simple 
rugged strain gauge encompassed in a ring of 
plastic material. The ring surrounded the 
penis, but they claimed it did not constrict 
and caused no discomfort. Changes in the 
diameter of the ring were recorded on a 
Grass preamplifier, Recordings from the ring 
were said to be linear with volume changes 
within a range of 25-40 millimeters, 

The authors reported using the device to 
measure changes in response to colored slides 
of nudes. They noted that volume displace- 
ment may take as long as 5 minutes to return 
to base line after stimulation, and it may not 


"D. H. Barlow, personal communication, July 17, 
1969 


return to the previous base line, but may 
level off at a greater displacement, neces- 
sitating a resetting of the base line. Measure- 
ments on seven subjects over a period of 18 
months yielded stable and reliable individual 
reactions. Reliable increases of about 1 milli- 
meter may be recorded even though the sub- 
ject is unaware of the change. 

Abel, Levis, and Clancy ?? used a modifica- 
tion of the penile transducer developed by 
Bancroít et al. (1966) in a preliminary study 
of the effects of aversive therapy on sexual 
deviants. Voyeurs, exhibitionists, and trans- 
vestites were used as subjects. Subjects made 
tapes which recorded incidents of their deviant 
sexual behavior. Shock was used as an un- 
conditioned stimulus to parts of the deviant 
tapes. Before treatment strong sexual arousal 
to deviant and nondeviant tapes was mea- 
sured with the penile transducer. Stronger re- 
sponse was associated with deviant than with 
nondeviant tapes. One week after the condi- 
tioning treatment the responses to the shock- 
conditioned deviant tape and nonconditioned 
parts of the deviant tapes were minimal, while 
the response to the nondeviant tape remained 
Strong. Eight weeks after treatment the re- 
Sponses to the deviant tapes were still minimal 
while the responses to nondeviant tapes were 
even stronger. 

Other methods of measuring penile erection 
have been devised by sleep researchers in- 
terested in the association between penile erec- 
tions during sleep and the REM stage of 
sleep. Fisher, Gross, and Zuch (1965) have 
experimented with a number of methods of 
recording penile erection during sleep. Their 
first attempt consisted of a polyvinyl tube 
about the size and shape of a doughnut, The 
tube was filled with water and fitted around 
the base of the penis. During erection the 
pressure on the tube resulted in a rise in the 
water level of a smaller tube attached to it. 
Apparently the size and bulk of the tube 
caused stimulation of the penis. Their second 
attempt involved the use of a small thermistor 
attached to the penis which recorded the 
changes in temperature produced by the in- 
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creased blood flow during erection. This de- 
vice proved to be too difficult to keep at- 
tached to the penis, and valid recordings were 
obtained in only 2 of 17 cases. The most suc- 
cessful device proved to be a mercury strain 
gauge developed by Shapiro and Cohen at 
the Downstate Medical Center. This device 
consisted of an elastic silicon plastic tube 1 
millimeter in diameter which was filled with 
mercury and sealed at both ends with plati- 
num electrodes to form a loop. The gauge 
formed one leg on a Wheatstone bridge circuit, 
and minute variations in resistance were mea- 
sured as the tube was stretched during erec- 
tion. The resistance changes were monitored 
on an Offner Type T electroencephalogram 
through a DC amplifier. The strain gauge may 
be calibrated by measuring the amount of 
deflection on the graphic tracing per unit 
change and circumference of the gauge as it 
is moved down on a tapering cone-shaped 
device graduated in centimeters. The authors 
stated that the rise in tracing was roughly 
linear to increase in circumference. Degree of 
erection for each subject was estimated by 
having the subject take measurements of the 
flaccid and erect penis. Using the more sensi- 
tive gain, minute increases of fractions of a 
millimeter were measurable. Using all devices, 
full or partial erection was found to be as- 
sociated with 95% of the 86 REM periods 
studied. A close temporal relationship between 
REM periods and erections was noted. In- 
creases in circumference of 2 centimeters or 
more were found to represent full or nearly 
full erection in most subjects. Partial erec- 
tions ranged from 2 millimeters to 2 centi- 
meters. 

Shapiro and Cohen’s strain-gauge device 
was used in a study of the relation between 
the erection cycle during sleep and dream 
anxiety (Karacen, Goodenough, Shapiro, & 
Starker, 1966). Most, but not all of the REM 
periods were accompanied by erection, but 
of these periods which were accompanied by 
erection, 95% yielded dream reports as op- 
posed to 85% of nonerection REM periods. 
REM awakening reports with a high anxiety 
content were less likely to be accompanied 
by erection. 

The techniques of pneumatic plethysmog- 
raphy were discussed by Lader (1967). Until 


313 


recently these methods have been primarily 
used with a digit, usually a finger. The ap- 
plication of this method to the penis, as in 
the methods of Freund and McConaghy, may 
create certain kinds of problems. Since the 
penis is quite sensitive to stimulation, the 
stimulation from the device itself could result 
in an initial arousal reaction, Asking subjects 
to insert their penises in such devices might 
cause some anxiety in subjects. Shapiro and 
Cohen’s and Bancroft’s strain-gauge devices 
would seem to be preferable since they are 
simpler to apply and do not stimulate as large 
an area of the penis. Furthermore, calibration 
is simpler with a near linear relationship be- 
tween circumference and change in current. 
Freund and McConaghy seem to rely on rela- 
tive measures of penile volume rather than 
measures on a known scale. Freund has in- 
dicated some problem with movement arti- 
facts which he says are detectable as rapid 
oscillations. The strain-gauge method is prob- 
ably less vulnerable to such movement artifact 
which makes it the device of choice in sleep 
studies where the subject would be turning 
considerably during the night. However, de- 
spite the drawbacks of the pneumatic plethys- 
mographic devices, they have proven to be 
highly reliable and discriminating in studies 
of sexual arousal, more so than any other 
physiological method discussed in prior sec- 
tions. Measures of penile erection will prob- 
ably be the methods of choice in future studies 
of sexual arousal in the male. Is there a simi- 
lar specific measure of arousal available for 
the female? Here is an area where the meth- 
odology is currently evolving and mostly un- 
published, The work on this area is described 
in a later section. 


Scrotum and Testes 


Masters and Johnson (1966) have noted 
changes in the scrotum and testes during 
sexual arousal. During the excitement phase 
there is a tensing and thickening of scrotal 
integument, an effect of localized vasoconges- 
tion, and contraction of the smooth-muscle 
fibres of the dartos layer. There is an eleva- 
tion of both testes toward the perineum, ac- 
complished by shortening of the spermatic 
cords. The constricted scrotal sac provides 
secondary support to the reaction of testicular 
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elevation. The testicular elevation progresses 
during the plateau phase until preejaculatory 
positioning where the testes are in apposition 
with the male perineum. Masters and Johnson 
Stated that this elevation of the testes is es- 
sential to ejaculation. 


If the testes do not undergo at least partia] elevation 
the human male will not experience a full ejaculatory 
sequence . . rise to a position 
of close apposition to the male perineum, an orgas- 
mic phase is certain to follow if effective sexual 
stimulation is maintained. Full testicular elevation is 
ejaculation [p. 208]. 


Another arousal phenomenon is an increase 
of testicular size (about 50%) also attributed 
to the vasocongestive reaction. 

Bell and Strobel :: have attempted to re- 
cord the scrotal and testicular reactions, In 
addition to the penile strain-gauge developed 
by Shapiro and Cohen (see prior section) they 
have used a scrotal strain-gauge placed around 
the neck of the scrotum. Muscle activity was 
measured in the cremaster and dartos muscles 
using Beckman electrodes placed slightly be- 
low the lower inguinal ring and right and left 
about two-thirds of the 


have reported the fol- 
lowing in a personal communication: 


correlated 
the data 
localized 
dartos activity preceding retraction and GSR, There 
is a suggestion of dartos rhythm between one and 
three seconds of each cycle. It tended to be pro- 
nounced at the beginning of recording sessions; how- 
ever it gradually disappeared. We are not sure yet 
an artifact or real activity. We have 
seen it in two-thirds of the subjects. We find one 
to three seconds per fluctuation; amplitude decreases 
as session proceeds [see Footnote 11], 


Further results on this new psychophysiologi- 
cal method should be of great interest to in- 


1 A. T. Bell and C, F, Stroebel, personal communi- 
cation, May 20, 1969, 
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vestigators interested in Sexual arousal. The 
fact that the authors found responses during 
emotionally charged interviews suggests that 
this measure may be a generalized stress in- 
dicator rather than a Specific measure of sexual 
arousal. 


Vaginal Blood Flow 


Masters and Johnson (1966) have found 
that the first physiological evidence of the 
human female's response to any form of sexual 
stimulation is the Production of vaginal lubri- 
cation. They suggested that this “sweating” 
Phenomena is not glandular but is the result 
of marked dilation of the venous plexus which 
encircles the entire vaginal barrel, "Apparently 
the transudation-like material which lubricates 
the vagina develops from the activation of a 
massive localized vasocongestive reaction Ip. 
70]." It is notable that the transudate ap- 
pears even in artificial vaginas, which are 
sections of bowel transplanted with blood ves. 
sels intact to the vagina site. These vaginas 
have no connection with the cervix, 


vaginal wall from a normal 
purplish-blue to the darker purplish hue of 
vasocongestion. The labia majora and labia 
minora also show color and size signs of 


in the diameter of the clitoral shaft and an 
increase in diameter and length of the shaft. 
there is a with- 
and a retraction 
the symphysis, 
DiBianco, and Rosen 
whether the association 
between penile erection and REM periods in 
i a parallel in the female 
vaginal responses, They first attempted to 
rom the vaginal wall. 
electrode held in a 
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also measured. These procedures have not 
produced reliable data because of movement 
and position artifact, and because the mea- 
surable responses were not large enough to 
be differentiated from the basal levels in- 
volved. For some subjects the process of at- 
taching or inserting electrodes proved to be 
erotically stimulating so that arousal was 
present before measurement was started. 

In order to induce arousal the authors se- 
lected female subjects who could sexually 
arouse themselves through fantasy. The sub- 
jects indicated when they were “turned on” 
or “turned off” by pressing a button which 
marked the record. At times the reading of 
erotic materials was used as a stimulant. 

The last, and to date, most successful of 
the methods was a thermal flowmeter designed 
to measure changes of blood flow in the 
vaginal wall. A vaginal diaphragm with the 
center cut out was used to hold the measuring 
instrument in place. Subjects were fitted with 
the proper size diaphragm and inserted the 
diaphragm themselves. This ingenious method 
solved the problem of holding a sensor in 
place and reduced the problems of stimula- 
tion and movement artifact. Two thermistors 
mounted in a plastic holder about 1 centi- 
meter apart were attached to the outside of 
the ring and held against the vaginal mucosa 
by the diaphragm. The matched thermistors 
were operated in a low current DC bridge in 
such a way that ambient temperature changes 
did not affect the bridge output. One of the 
thermistors was heated a few degrees above 
body temperature by square wave pulses to 
a heating element inbedded in its glass en- 
velope. After initially establishing a heating 
level and rebalancing the bridge to zero out- 
put, an electronic negative feedback loop was 
closed which then varied the external heating 
to the thermistor, maintaining the null out- 
put condition. Since changes in blood flow 
alter the thermal conductivity of the tissue 
in contact with the heated thermistor, moni- 
toring the power (heating) supplied to this 
thermistor yielded an indication of relative 
blood flow changes in the underlying tissue 
system. 

The results obtained from a small number 
of subjects seem to indicate that the vaginal 
blood flow technique is sensitive to reported 
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changes in sexual arousal. At present it is not 
certain how generalizable these results are, and 
work is going on at the Downstate Medical 
Center to perfect the technique. The use of 
the diaphragm ring to hold electrodes has led 
to a reexamination of the other techniques, 
vaginal pH and conductance measurements, 
which were given up in part because of the 
problems of holding electrodes in place. 

Tart ** has invented a device to measure 
blood flow in the clitoris: a “clitoroplethysmo- 
graph.” The device was attached to a vaginal 
stabilizing rod which fits into the vagina. The 
device was mounted surrounding the clitoral 
tissue mass. A photocell recorded blood volume 
changes in the clitoris. Provision was also 
made for a photocell in the stabilizing rod, to 
measure vaginal blood flow, and silver cloth 
electrodes for impedance plethysmography 
voltage or resistance measurement. 

A modification of Tart’s device using only 
the photoelectric cell for the vagina has been 
developed by Fisher and Davis?? at Mount 
Sinai Hospital. A photocell was used to record 
fluctuations in light transmission through tis- 
sue. The light reflected back to the photocell 
from the vaginal capillary bed was measured. 
A solid state light source was used instead of 
incandescent bulbs which generate heat. The 
light source emitted light in the red spectrum 
without heat. The complete device also in- 
cluded a thermistor to measure temperature 
changes and a strain gauge to indicate intra- 
vaginal movements. Silicone rubber was used 
as a platform for the plethysmograph com- 
ponents in order to make the device com- 
fortable for the subject and sterilizable. Fisher 
and Davis investigated changes in vaginal 
blood flow during sleep. The investigators ex- 
pected to find a progressive increase in pulse 
volume during REM periods on the assump- 
tion that the vasocongestion seen in sexual 
arousal would characterize REM periods in 
females. They found marked fluctuations in 
REM periods rather than progressive changes, 
but these fluctuations were also found in a 
peripheral site, the toe. The implication is 
that the fluctuations in pulse volume were 


12C. T. Tart, personal communication, April 29, 
1969. 

18 D. M. Davis, personal communication, June 30, 
1969. 
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expressions of the general cardiovascular 
changes seen during REM periods. 


Uterine Contractions 


Masters and Johnson (1966) stated that as 
excitement-phase levels of sexual tension pro- 
gress toward the plateau, the entire uterus is 
elevated from its position in contact with the 
posterior vaginal floor to a posterior and su- 
perior plane in the false pelvis. Full elevation 
is not achieved until the plateau phase. 
Corpus irritability increases from early in ex- 
citement and resolves into an identifiable 
contraction pattern that has specific orgasmic- 
phase orientation. 

A study by Bardwick and Behrman (1967) 
suggests that uterine contractions might have 
some significance for sexual arousal early in 
the excitement phase. They used 10 paid 
volunteers as subjects. On the basis of psycho- 
logical tests the subjects were divided into 
one group of subjects who were sexually 
anxious, passive, and neurotic, and another 
group who were not high on these traits. The 
subjects were studied at various points in 

their menstrual cycles. Sexual stimuli used 
included erotic passages from books, double 
entendre words, and cartoons from Playboy. 
The device used to measure uterine contrac- 
tions consisted of a thick-walled polyethylene 
tube connected to the tip of a rubber balloon 
which was inserted through the cervix into 
the uterus and filled with water. The pressure 
on the water was standardized to atmospheric 
pressure through a transducer and led into 
a Sanborn Type R recorder. A half hour was 
allowed for stabilization before stimuli were 
presented. Measures included amplitude, 
tonus (tonus as defined appears to be inter- 
changeable with amplitude) contraction dura- 
tion, and number of contractions. Menstrual 
cycle phase seemed to affect the duration of 
the contractions more than the amplitude or 
tonus. Psychological stimulation seemed to af- 
fect the amplitude and tonus, but not the 
duration of contractions. The authors stated: 


The present data indicate that the uterus, at any 
cycle phase will respond to anxiety and sexual stimu- 
lation with an increased mean amplitude and am- 
plitude variance .. . When the S [subject] data 
were pooled, there were no consistent differences in 
uterine motility between sessions in which the psy- 


MARVIN ZUCKERMAN 


chological stimuli had sexual content and those in 
Which the content was neutral. When the Ss were 
divided into those who were highly sexually anxious, 
we found that the sexually anxious Ss reacted more 
strongly to the sexually relevant stimuli [p. 476]. 


Highly anxious women tended to extrude the 
intrauterine balloon while low-anxious sub- 
jects merely had uterine spasms without 
balloon extrusion. Balloon extrusion in high- 
anxious subjects occurred particularly when 
sexual material was being presented. 

The data from the experiment suggest that 
uterine contractions are a joint function of 
sexual arousal and anxiety. From the Bard- 
wick and Behrman article it is not clear if 
sexual arousal and anxiety reaction could be 
distinguished using this technique. Insertion 
of a balloon in the cervix is quite painful for 
many women which would limit the usefulness 
of this technique. 


Temperature 


Masters and Johnson (1966) have noted] 
the “sex flush” that develops in response to 
sexual stimulation. In the female the flush 
appears late in the excitement phase, first 
over the epigastrium, then spreading rapidly 
over the breasts. In the plateau phase the 
flush is said to have a widespread distribution, 
Males show no evidence of the flush in the 
excitement phase, but in some it appears over 
the epigastrium in the plateau phase and 
spreads to the anterior wall, near neck, face, 
and forehead. The sex flush is not universal, 
appearing in 75% of females and only 25% 
of males. However the phenomenon suggests 
that skin temperature measurements might 
have some usefulness as measures of sexual 
arousal. Masters stated that skin temperature 
changes are unreliable as measures of arousal, 
not being constant within subjects on different 
Occasions.!* 

Wenger, Averill, and Smith (1968) mea- 
sured face and finger temperatures in their 
study of autonomic responses to erotic literary 
passages. Finger temperature showed a sig- 
nificant decrease during the reading of erotic 
materials; face temperature did not yield sig- 


nificant differences between erotic and control 
materials, 


14 
* W. Masters, personal communication, May 1969. 
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Corman (1968) found similar results on 
temperature measures. l'ace temperature was 
not significantly affected by either slides of 
nudes or the erotic motion picture. Finger 
temperature showed a significant decrease in 
response to the motion picture, but no sig- 
nificant change in response to the Playboy 
nudes. Romano (1969) did not find significant 
effects of the same erotic movie on finger, 
face, or chest temperatures. 

Fisher et al. (1965) attempted to measure 
penis and groin temperature during REM 
phases of sleep. There was considerable dif- 
ficulty in keeping thermistors attached to the 
penis. An interesting finding of an inverse 
relationship between penis and groin tempera- 
tures (groin temperatures tend to fall when 
penile temperatures rise and vice versa) sug- 
gests that blood is withdrawn from adjacent 
areas to fill the penis during erections. 

Shapiro et al. (1968) have used a thermal 
flowmeter as an index of vaginal blood flow. 
Their technique was described in a prior 
section. 

Fisher and Osofsky (1968) measured head 
and leg temperatures of females during con- 
trol sessions and after gynecological examina- 
tions. Rectal and vaginal temperatures were 
measured only after the genital stimulation of 
the examinations. As was mentioned previ- 
ously, there is no reason to believe that the 
gynecological examinations were sexually 
arousing, although they did affect GSR, skin 
resistance, and heart rate. No changes were 
found in head and leg temperatures. Rectal 
and vaginal temperatures after the examina- 
tion were almost identical. Vaginal tempera- 
ture correlated significantly with GSR fre- 
quency, heart rate, and rectal temperature. 

Although not enough data are available on 
Skin temperature as a measure of sexual 
arousal in the excitement phase, it would ap- 
pear that such measurement would be valuable 
only on or in the genitals. The vasocongestive 
response during erection appears to involve 
some diversions of blood from surrounding 
tissues and a drop in skin temperature might 
be recorded from these tissues or even in more 
peripheral areas such as the finger. Care must 
be taken during temperature measurement to 
keep the ambient room temperature and 


humidity constant since skin 
changes are liable to be small. 


temperature 


Pupillary Response 


Studies by Hess and Polt (1960) and Hess, 
Seltzer, and Shlien (1965) have stimulated 
research in the use of pupillography as a 
method of measuring sexual arousal or in- 
terest. Hess (1968) has reviewed some of 
this work. The anatomy and physiology of 
the pupil are described in a chapter by Lowen- 
stein and Loewenfield (1962). The main func- 
tion of the iris is to regulate the amount of 
light entering the eye, to increase depth of 
focus of the eye, and to reduce chromatic 
and spherical aberrations especially in bright 
light. The size of the pupil is controlled by 
sphincter pupillae and the dilator pupillae. 
The pupillary sphincter is an annular band 
of smooth muscle which encircles the pupil. 
It is activated by parasympathetic fibres and 
can constrict from 8 millimeters (dark) to 2 
millimeters (light) in seconds. The dilator 
pupillae are radial strands of smooth muscle 
which converge upon the pupil “similar to 
wheel spokes." These muscles are controlled 
by sympathetic fibres. The size of the pupil 
is a function of spontaneous or reactive shifts 
of the dynamic equilibrium of sympathetic 
and parasympathetic innervation. Specific re- 
flexes are imposed on this equilibrium, In- 
crease of light, and convergence of the eyes 
and accommodation of the lens when viewing 
a near object, cause contraction of the pupil. 
Decrease in illumination and sensory or emo- 
tional stimuli result in dilation. Dilation is the 
result of two neural effects: (a) sympathetic 
discharges which reach the dilator pupillae 
and cause it to contract and (b) inhibitory 
impulses which cause the sphincter pupillae 
to relax. 

The reciprocal innervations of the iris by 
the sympathetic and parasympathetic systems 
make it an organ of some interest to psycho- 
physiologists. However, the great sensitivity 
of the organ to changes in illumination has re- 
sulted in methodological problems which have 
not always been sufficiently recognized. This 
is particularly true when complex visual stim- 
uli are used to stimulate emotional changes. 
Woodmansee (1966) has pointed out the 
methodological problems in the use of pupil- 
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lography to measure psychophysiological re- 
actions. He noted that controlling the overall 
illuminance of visual displays is not sufficient 
because the pupil may contract 1% to 5% in 
Size when the gaze shifts from a relatively 
dark to a relatively brighter area of a test 
Stimulus. Since the reactions to emotionally 
arousing stimuli rarely exceed 576, they might 
easily be accounted for by a subject’s shift 
of fixation. Pupillary responses are similar to 
GSR in the marked “arousal decrement effect” 
or habituation. In experiments where several 
control and test stimuli are presented, the 
major responses are to the first stimuli, and 
the differences between test and control stim- 
uli may become smaller as the subject be- 
comes less interested in the experiment. The 
near vision reflex can also account for shifts 
in pupillary size. If the subject focuses on 
test stimuli but allows his vision to blur on 
control stimuli, by fixating behind the plane 
of projection, constriction may occur on the 
control stimuli and dilation when he fixates 
the test stimuli. Finally, the author stated 
that the high variability of spontaneous pupil- 
lary activity (1% to 20%) can produce con- 
siderable “noise” in an experiment. The test- 
retest reliability of pupil size is said to be 
only about .30. However, the reliability of re- 
sponse to a constant stimulus may be greater. 
Bender (1933) found marked consistency of 
individual reactions in a day-to-day response 
to a standard light stimulus. Woodmansee has 
suggested various ways of reducing these extra 
experimental influences, and these are dis- 
cussed later after examination of the experi- 
ments which have used pupillography to mea- 
sure responses to sexual stimuli. 

Hess and Polt (1960) presented “pilot” 
data on two females and four males. Paren- 
thetically, it is amazing how the labeling of an 
experiment as “pilot” has so little effect in 
inhibiting the tendency to play up the results. 
Generalizations about pupillographic sex dif- 
ferences based on these two females and four 
males have been widely promulgated despite 
the fact that the author has not yet published 
an extended study based on an adequate 
number of subjects.5 The stimuli in this ex- 


15Tn a personal communication (July 17, 1969) 
Hess stated that the Hess and Polt (1960) study 
has been “consistently replicated” “with a few 
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periment were pictures of a baby, a mother 
and baby, a nude male, a nude female, and 
a landscape. The female subjects showed 
more dilation than males to the baby, mother 
and baby, and nude male pictures; the males 
dilated more in response to the nude female. 
The authors concluded “men are more in- 
terested in partially nude women, women are 
more interested in partially nude men [p. 
132].” While there might be some truth in 
this conclusion, the data presented are hardly 
sufficient to make such a sweeping generaliza- 
tion. 

Another “pilot study” by Hess and his co- 
workers (Hess et al., 1965) compared the 
responses of five known homosexual and five 
heterosexual males to 15 slides including 5 
nude males, 5 nude females, and 5 art slides. 
The authors made an index of the relative 
responses to male and female stimuli. All five 


heterosexuals had a positive index, showing 


relatively greater pupil dilation to females 
than to males; four of the five homosexuals 
had a negative index, indicating greater dila- 
tion to nude males. The author's theory claims 
that dilation is an expression of positive 
arousal, and contraction is an indication of 
negative arousal. It is interesting, in view of 
this theory, that two of the five heterosexuals 
actually dilated to male stimuli, and only four 
of the five heterosexuals actually dilated to 
female stimuli. Also, while four of the five 
homosexuals dilated to male stimuli, three of 
the five also dilated to female stimuli. 

The assumption that contraction indicates 
negative arousal is questionable. There is little 
indication that anything but light or other 
visual reflexes can cause constriction. Bender 
(1933) studied the effect of loud noise (gun- 
shot) pin-pricks, electric shocks, and the nega- 
tive stimulus of a white rat presented just 
preceding or simultaneous with the exposure 
to light. All of the emotional stimuli resulted 
in inhibition of the normal contraction to 
light, with longer latencies, more extensive 
responses, and a longer time for the pupil to 
reach maximum contraction. While painful 


thousand subjects" In 4 second communication 
(August 5, 1969) Hess said that he has “personally 


run several hundred Subjects? and found similar 
results. 
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stimuli cause some oscillation of response, the 
dominant response in most subjects to most 
stimuli was a “dilatory” contraction. In no 
case did a stimulus do more than slightly de- 
lay the powerful contraction response to 
light. This maximal response is reached rather 
quickly without other stimulation: 3.5 sec- 
onds in four subjects and 3.0 seconds in two 
subjects. Emotional stimuli merely extend 
the duration of response a few seconds. It is 
easy to see how differences in illumination of 
stimuli presented could mislead investigators 
to assume they are measuring differences in 
emotional reactions. Hess claimed that il- 
lumination differences would affect all sub- 
jects similarly, but, as Woodmansee (1966) 
pointed out, shifts in fixation within each 
stimulus might vary from subject to subject. 

Hess (1968) has gone so far as to suggest 
that *pupil response can serve as a more 
accurate representation of an attitude than 
can responses to well-drawn questionnaires or 
to projective techniques . . . [p. 580]." The 
following studies suggest that this claim may 
be premature. Seven experiments have been 
published which provide data relevant to 
Hess's hypotheses: three of these used male 
subjects only, while four of them compared 
responses of males and females. 

Sims (1967) investigated the responses of 
12 pairs of married subjects to clothed pic- 
tures of men and women. There were two 
pictures of each sex: one in which the pupils 
of the picture subject were retouched to ap- 
pear dilated, and the other in which the 
pupils appeared constricted. The pupils of the 
subjects dilated significantly more in response 
to opposite-sex pictures than to same-sex 
pictures, Furthermore, the subjects showed 
greater response to pictures of the opposite 
sex with dilated pupils than to pictures of 
persons with constricted pupils. The opposite 
was true for same-sex pictures. While the 
author did not use nude pictures, the results 
tend to substantiate the Hess hypothesis of 
greater pupil dilation to pictures of the op- 
posite sex. The results due to the portrayal of 
the pupils in the pictures tended to indicate 
that the dilated pupil in the opposite sex is 
an arousal stimulus. A paper by Hicks, Reany, 
and Hill (1967) tends to support the latter 
finding although preferences were expressed 
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verbally, and pupil measurements were not 
made in this study. 

Scott, Wells, Wood, and Morgan (1967) 
used four pictures in each of four categories: 
clothed males, clothed females, female nudes, 
(from Playboy), and seminude males (from 
Muscleboy). Gray rectangles were used as 
control stimuli; these were trimmed so they 
reflected the same amount of light as the pic- 
tures. The pupillary responses of 10 male 
and 10 female undergraduates were com- 
pared. No significant effects were found at- 
tributable to sex of subject, sex of pictures, 
nudity of pictures, or any of the interactions 
of these variables. Four of 10 males dilated 
more to male seminude pictures than to fe- 
male seminudes; 3 of 10 females dilated more 
to the female than to the male pictures. These 
data clearly provide no support for Hess’s 
hypothesis of male-female differences. The 
second experiment tested the responses of five 
homosexual and five heterosexual males to 
these stimuli. Two of the five subjects in each 
group dilated more in response to male pic- 
tures than to female pictures. The difference 
between groups was not significant. A third 
experiment examined the responses of inde- 
pendent groups of 10 males and 10 females to 
(a) a pistol shot, (5) seminude male pictures, 
(c) seminude female pictures, Again, no sig- 
nificant sex differences in response to the 
different stimuli were found. Males actually 
showed more dilation in response to male 
pictures than to female pictures. 

Peavler and McLaughlin (1967) examined 
the responses of four male and four female 
college students to various stimuli, one of 
which was a female nude “pin-up.” Control 
stimuli were blank slides made darker than 
the darkest points on the test stimuli. The 
mean response to the nude was 2.2% dilation 
while all other stimuli produced constriction, 
as would be expected from the brightness 
differences between the test and control stim- 
uli. No mention was made of sex differences. 
In the second experiment, words rated on a 
"good-bad" continuum were presented as 
visual stimuli. No relationship was found be- 
tween pupil diameter and rated evaluations 
of the words, casting some doubt in Hess’s 
hypothesis of direction of arousal and dila- 
tion-contraction of the pupil. 
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Lawless '* used 14 airmen and 7 female 
nurses as subjects. The stimuli were photo- 
graphs and paintings and included male and 
female nudes and clothed figures. There were 
no significant differences in the relative male— 
female response (Hess’s index) of male and 
female subjects to male and female stimuli, 
Male subjects actually showed significantly 
more dilation to male pictures than to female 
pictures, and reacted faster with a shorter 
latency to male pictures. Both groups reacted 
faster to nude figures than to clothed figures. 

Nunnally, Knott, Duchnowski, and Parker 
(1967) examined the responses of 30 male 
Students to a series of four slides showing a 
girl getting undressed. In the first picture the 
girl was clothed, and in the last three pictures 
she was in various Stages of undress, Control 
slides were numbers on a gray background 
and were constructed darker than the test 
slides, Setting a bias against finding greater 
dilation to the test stimuli. A significant in- 
crease of 7% dilation change was found going 
from the clothed-girl slide to the first Stage 
of undress. Further Progressions in the Strip- 
tease produced no increase in dilation. AII 
test slides taken together produced signifi- 
cantly more dilation than control slides. An- 
Other experiment showed that dilation was 
produced by novel as opposed to nonnovel 
stimuli, particularly on the first presentation 
of a novel stimulus. Positive affect pictures 
(faces of pretty girls) produced more dilation 
than neutral or negative affect pictures (faces 
With cancerous growths), but negative effect 
did not produce constriction as postulated by 
Hess. Dilation was also produced by muscle 
strain, auditory stimuli, and expectancy of 
gunshot. The results show that pupillary dila- 
tion may be produced by many kinds of 
arousal, one of which is the pleasant reactions 
of males to semiclothed or pretty girls, 

Bernick et al. (in press) measured pupil 
responses of nine male medical Students to 
slides of clothed males and females and to 
three movies: a heterosexual stag movie, a 
homosexual stag movie, and a suspense movie, 
Brightness and contrast variations within each 


™ Lawless, J. C. Sex differences in pupillary re- 
sponse to visual stimuli, Paper presented at the meet- 
ing of Society for Psychophysiological Research, 
Washington, D. C., October 1968, 
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subject were said to be effectively eliminated 
by the illumination of the rear projection 
Screen to a constant minimum brightness 
level. The brightest and darkest area were 
limited to a range of 24-26 footcandles. There 
Were no significant. differences in pupillary 
Tesponse to male and female slides. All of the 
movies produced more pupillary dilation than 
the slides. Significant differences were found 
between 


stimuli. The lack 
female slides, and between heterosexual and 


pothesis (assuming that the subjects were not 
bisexual). 

The final study by Chapman, Chapman, and 
Brelje (1969) points up a variable that is too 


a younger undergraduate Who dressed 
formally and 
“breezy” manner, 
elicited greater dilation to female than to 
male slides in 14 out of 22 subjects; the in- 
formal examiner elicited greater pupillary dila- 
tion in 20 out of 


of response to the male stimuli, as well as to 
the female stimuli, was dilation, with 44 of 
47 male subjects showing dilation to male 


ing the differentia] 


; response of males to 
female stimuli, 


The question raised by this 
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experiment (assuming that the experimenter 
difference was real and generalizable) is, How 
does a subject voluntarily inhibit pupillary 
response? The pupillary response is consid- 
ered to be involuntary. Perhaps the lack of 
response to pinups with a formal experimenter 
is due to a general inhibition of sympathetic 
activity through inhibition of fantasy response 
to the stimuli. The study bears replication 
with additional data on subjective responses 
to the stimuli. 

Simpson * and his co-workers examined the 
pupillary constriction response and the delay 
of ejaculation time in response to antidepres- 
sive drugs, They found some suggestion of 
negative correlation between these variables: 
as the ability of the pupil to constrict lessens, 
the ejaculation time tends to increase. 

In the experiments most relevant to Hess's 
hypotheses, those using male and female sub- 
jects, or heterosexual and homosexual subjects, 
only the study by Sims (1967) confirmed 
Hess's findings. From all of these studies, it 
would seem that pupillographic measures like 
other peripheral autonomic measures, such as 
GSR, are sensitive to the arousal produced by 
pictures of nudes or movies of sexual activity, 
but do not reflect the differential interest pat- 
terns of males and females in male and fe- 
male figures, Furthermore, there is no sup- 
port of Hess's hypothesis that dilation reflects 
positive interest and contraction negative in- 
terest. Rather, the magnitude of dilation 
seems to reflect sympathetic arousal value of 
novel, intense, or interesting stimuli. How- 
ever, Hess (1968) has stated that “all of our 
subsequent research, involving a large number 
of subjects has more than confirmed these 
initial findings [p. 575].” Hess should collate 
and publish these findings to resolve the 
doubts raiseq by the published research of 
others. The results of the third study of Scott 
et al. (1967) suggest that GSR may be more 
sensitive to stimulus differences than pupil re- 
sponse. If this is true, one should consider the 
relative expense and methodological problems 
in the two methods. Neither seems to yield 
responses which discriminate between differ- 


17 Unpublished study by G. M, Simpson, P. Harper, 
& E. Beckles entitled “The Effects of ‘Three Anti- 
depressant Drugs on Pupil Response to Light and 
Ejaculation Time,” 1969. 
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ent types of stimuli, and both show response 
to novelty and rapid habituation. However, 
GSR is cheaper and there are fewer methodo- 
logical problems in the use of visual stimuli. 
Hess (1968) suggested that GSR might profit- 
ably be used to supplement pupillographic 
measurement. Tt should be pointed out that 
Some of the problems with pupillography 
could be surmounted if auditory stimuli were 
used instead of complex visual stimuli, if 
lengthy trials were avoided, if the response 
were corrected for basal changes during the 
series, and if the reliability of responses to 
specific categories of stimuli were checked. 
The one hopeful note in the possibility of 
pupillographic response as a measure of sexual 
arousal is in the high correlations with re- 
ported erection in the Bernick et al. (in 
press) experiment. But these correlations were 
based on a sample of nine cases, and erec- 
tions were not actually measured or observed. 
These correlations certainly need replication 
with an adequate number of cases, and one of 
the objective penile erection measures men- 
tioned in the previous section on this method. 


Evoked Cortical Response 


Lifshitz (1966) has studied the effect of 
various kinds of pictorial stimuli on the aver- 
age evoked cortical response. Among other 
stimuli he used bland, scenic photographs, 
a “negative affective? series of photo- 
graphs of ulcerated legs, and a "positive af- 
fective” series of “art studies” of nude fe- 
males. The subjects were 10 young males. 
Slides were projected in focused or defocused 
presentations. It was possible to distinguish 
the focused or defocused presentations from 
the form of the average evoked cortical re- 
sponse, but the author could not tell which 
set of slides was being viewed from the aver- 
age evoked cortical response. 


However in any particular individual the form dif- 
ferences between the AERs [average evoked cortical 
responses] for the different subject slide groups 
tended to be consistent and in the four individuals 
who were subjected to repeated runs of the different 
slide groups it was possible, once their “code” was 
known, to tell from the AER which slide group 
they were looking at [p. 61] 


This is an interesting example of individual 
response specificity within a recording tech- 


Biochemical Determinations 


As was mentioned in the introductory Sec- 
tion on hormones, the author has failed to 
find any studies of the effects of sexua] arousal 


testosterone from the gonads in the male, 
Seventeen ketosteroids, a metabolite of testos- 


sexual stimulation (Levi, 
periment examined the effect of high quality 
“love films" which 
scenes without showing human sexual organs, 
These scenes were shown to 15 female office 
clerks, Urinary adrenaline and noradrenaline 


erate and pleasant; 
adrenaline reactions were minimal or absent, 
Levi then examined the effects of a more 
blatantly erotic movie, confiscated by the 
Swedish legal authorities and presented to the 
author for research purposes. The movie was 
shown to 53 females and 50 males, Physio- 
therapy and medical students. Urine samples 
were collected for three 90-minute periods: 
one prior to the film, one during the film, anq 
one 90 minutes after the film. The first and 
third samples were used to provide a base line 
for measurement of cathecholamine reactions 
to the film. Both males and females showed 
a significant increase in adrenaline. but the 
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increase in the male was significantly greater 
than that in 


creatinine, 

Subjects also rated their subjective reac- 
tions. Both sexes 
Sexual arousal during the film, but the in- 
crease was significantly greater 


were minor compared to pleasurable sensa- 
tions, in females the two kinds 


relations with other affects were not reported. 
In males these correlations were minima] and 


Significant dif- 


Sexual arousa] self-rating data as supporting 
hypothesis that men 
more responsive to visual sexual stimulation 
than are females, Tt Should be noted 


pathetic System arousal, 
Bernick et al, (in measured 


plasma 17-hydroxycorticoidg (17 OHCS) in 
eight male 


second 
nificant, and there 
differences between reac- 


ee 
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tions to the three movies. The lack of adreno- 
cortical reaction to the erotic movies in this 
experiment is interesting in view of the marked 
adrenomedullary response in the Levi experi- 
ment. However Levi used urine samples from 
the entire period of stimulation to measure 
cathecholamines whereas Bernick et al. (in 
press) drew their second blood sample 
after a 10-minute postmovie interview. Plasma 
levels of hydrocortisone can change rapidly 
and the postmovie level may just as well have 
reflected responses to the interviews as well 
as responses to the movie. 

Clark and Treichler (1950) first suggested 
that urinary acid phosphatase (AP) may be 
an indication of sexual arousal in males; and 
Gustafson, Winokur, and Reichlin (1963) as- 
sessed AP and plasma nonesterified fatty 
acids (NEFA) as possible indicators of sexual 
arousal in men and women. AP in men may 
come from prostatic secretions and therefore 
might be influenced by sexual arousal. Stimu- 
lation of the prostate gland causes AP to be 
secreted into the urethera, The subjects con- 
sisted of 17 men and 7 women. Three homo- 
sexual men were also tested. The stimulus was 
an 11-minute film portraying heterosexual re- 
lations. AP was determined from urine, and 
NEFA from blood samples taken before and 
after the film. Following the film there was a 
significant mean increase in urinary AP in 
men of 72% and a nonsignificant mean in- 
crease in women of 11%. Three of the five 
men who did not show the increase had higher 
initial levels than the other men, which may 
have limited their change (Law of Initial 
Values). Only one of the three homosexuals 
showed an increase in AP. Serum NEFA was 
tested in eight of the men, and only five of 
the eight had increased levels after the film. 
All subjects reported being sexually aroused. 
Eleven more male subjects were exposed to 
the film with concomitant electric shocks. 
Only 29% of this group had an increase in 
AP as opposed to 72% in the other hetero- 
sexual male group. The authors speculated 
that an emotionally induced cholinergic dis- 
charge over the nervi erigens stimulates secre- 
tion of epithelial cells of the prostatic acini 
and that the AP found in urine came from 
the prostatic secretion. The sex differences and 
differences between heterosexuals and homo- 
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sexuals in this experiment are interesting, but 
like so many other experiments in this field 
the generality of findings is limited by the 
low sample number, inadequate matching of 
samples, and lack of comparisons with other 
types of stimulation. 

Barclay has used the AP measure in three 
Studies attempting to validate various forms 
of projective techniques. In the first study, 
(Barclay, 1969) an attempt was made to 
arouse anger in college students in order to 
demonstrate a connection between aggression 
and sexual arousal.® The subjects were fra- 
ternity men and sorority women, and anger 
was induced by an experimenter insulting the 
quality of students in these organizations. 
Urine samples were collected before the anger 
arousal and after TAT stories were written 
following the anger arousal. Males subjected 
to the anger arousal procedure showed more 
aggression in questionnaires and TAT stories 
than control males. Aroused males also had 
more sexual content on appropriate pictures 
and secreted more AP in urine than non- 
aroused males. 

In Barclay’s second study (1970) an at- 
tempt was made to test the response of 
urinary AP to more direct sexual stimulation. 
Fifty-five male subjects were assigned to one 
of three conditions: arousal with information, 
arousal with false information, and control. 
Subjects in the arousal conditions were shown 
pictures of nude females taken from nudist 
magazines; control subjects were shown pic- 
tures of buildings. The arousal with informa- 
tion group was told the purpose of the sexual 
stimulation, while the other arousal group 
was given false information in which the 
sexual arousal procedure was construed as ir- 
relevant to the urine collection. The author 
hypothesized that subjects who know their 
physiological sexuality is being assessed may 
become defensive and this may actually re- 
duce their AP measured arousal. The differ- 
ences between the three groups on AP was 
not significant. However, subjects differed in 
their reported arousal by the pictures, Sub- 


18A recent finding by H. Persky of a substantial 
positive correlation between Testosterone produc- 
tion rate and a questionnaire aggression scale in 
young males tends to support this hypothesized 
connection. 
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jects who rated the pictures as more sexually 
arousing showed significantly more increase 
on AP than controls, while subjects who found 
few of the pictures arousing did not differ 
from controls. Unlike Gustafson et al. (1963) 
who found no relation between self-reported 
arousal and AP, Barclay found that the self- 
rated arousal was crucial. Perhaps this is be- 
cause the nude photographs used by Barclay 
were not as uniformly arousing as the sexual 
movie used by Gustafson. 

Barclay's third Study ?? did use an erotic 
movie to test the hypothesis of AP as a mea- 
sure of sexual arousal and the influence of 
information or set on Physiological sexual 
arousal. This study also examined the effect 
of the subjects’ sexual experience on AP re- 
sponse. Nonexperienced subjects showed little 
or no AP response, regardless of whether they 
were stimulated or not, or what information 
was provided beforehand. Among the sexually 
experienced males the aroused, noninformed 
group showed a significant AP response, and 
the aroused informed and the control nonin- 
formed groups did not, as predicted by the 
authors. However a “paradoxical” finding 
emerged in the control-informed group, which 
knew that other subjects were watching 
erotic movies: these subjects showed an in- 
crease in AP following a nonerotic boring 
film. The author speculated that the informa- 
tion alone may have stimulated erotic fan- 
tasies in this group, High “sexual drive level? 
subjects (drive measured by number or re- 
ported orgasms per month) secreted sig- 
nificantly more AP overall than low-drive 
subjects. This last intriguing finding suggests 
that AP secretion may be related to sexual 
arousability as well as to sexual arousal. 

Barclay ® reported a fourth study under- 
way to test the specificity of AP secretion as 
a measure of sexual arousal. The effects of 
sexual-, aggressive-, anxiety-, and euphoria- 
arousing conditions on AP are being compared, 

The findings from the series of studies by 
Barclay are complicated, implicating arousal 
conditions, information or set, subjects’ sexual 


A. M. Barclay entitled 
Control of Sexual 


19 Unpublished study by 
“Information as a Defensive 
Arousal,” 1969. 

20 A, M. Barclay, personal communication, August 
20, 1969. 
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experience and drive levels as influences on 
AP secretion. However there are presently 
Strong indications that urinary AP secretion 
may be a useful measure of sexual arousal. 
The results of Barclay's study in progress will i 
be crucial in determining the affect-response ^ 
specificity of AP secretion. 


Critique and C onclusions s | 


An examination of the publication dates of 
the references Shows that the study of the 
Physiology of the human sexual response is 
new. Kinsey and Masters and Johnson de- 
Serve much of the credit for the breakthrough 4 
in this taboo area, However, as With most new 
is exploratory 
rather than hypothesis testing, Inadequate 
numbers of subjects are too often used to 
make generalizations of any import. How can 
one generalize about “sex differences” based - 
on sample sizes of a few subjects of each sex? 


and carping. The other reason is charity, In 
à new area every little bit of information is 
helpful, and if these researchers have the 
Courage to breach the wall of taboo they may} 
be permitted the indulgence of some “lets | 
look and see» data collecting. Hopefully, er 
more investigators enter the field, the com. | 
petition for journal space will result in a/ 


“natural selection? of better designed ar la’ 
conceived research. | 


l 
The research review has attempted to anso 
Swer some simple questions concerning sexual 
arousal. Do Psychologica] Sexual stimuli elicit 
physiological responses of greater magnitude 
than stimuli with nonsexua] content? Do 


5, females for heterosexual male | y i 
s for homosexual male subjects, 3. 
fetish objects for ^ 
e effect of the ex- 


children for 
fetishists, etc.» What is th 


| 
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perimental set or atmosphere on physiological 
responses to sexual stimuli? 
Some other important questions have rarely 
| been asked. What is the relation between 
quantity and variety of experience and sexual 
arousal? What is the effect of relative depriva- 
tion (time since last orgasm) on sexual 
arousal? What are the ingredients of a sexual 
stimulus which make it relatively more or less 
arousing? 
In many studies the authors seem to as- 
sume that the stimuli they are using are 
- sexually arousing. Playboy nudes are a favor- 


M ite type of stimulus. But in this era of public 


nudity such stimuli may become quite hum- 
drum. The study of Corman (1968) showed 
considerably more arousal to erotic movies 
than to Playboy nudes. Tf discrete slides are 
used, why not use pictures of actual coitus? 
In a study by Brady and Levitt (1965) pic- 
tures of “ventral-ventral” coitus were rated 
by males as more arousing. than nudes or 


Bovis of of other forms of sexual contact. 


ovies of sexual activities are probably more 
arousing than static pictures, but there are 
problems in measuring physiological reactions 
during such complex visual presentations. 

_ Lazarus (1966) and his group in California 
' have evolved a psychophysiological methodol- 
ogy for measuring reactions to movies which 
should be studied by persons using such stim- 
uli, The typical low-grade “stag” movie may 
elicit hilarity or disgust along with sexual 
arousal. The type of erotic movie used by 
Corman (1968) and Romano (1969) may be 
preferable to’ “stag” movies. Some attention 
should be given to the stimulus dimension of 
` sexual arousal studies. The scaling of sexual 
stimuli such as that done by Brady and 
Levitt (1965) is an example of what needs to 
be done. Other modes of presenting stimuli 
have not been explored, that is, auditory 
presentation, or combined visual and auditory. 
While measurements of arousal during actual 
coitus pose many problems for physiological 
measurement, autoerotic manipulation or the 
use of mechanical masturbatory devices might 
yield valuable data. 

This review has not dealt extensively with 
psychological methods used to measure sub- 
jective arousal. Many authors have even 
neglected to obtain such self-reports. Without 
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this kind of data it is impossible to assess 
whether the stimuli used were actually sex- 
ually arousing, and if physiological reactions 
were more related to subjective sexual arousal 
or to some other types of affective reaction. 

Experimenters attempting to use psycho- 
physiological methods to measure sexual 
arousal face some old problems familiar to 
those who have attempted to use psycho- 
physiological methods to study other emo- 
tions, Most measures from different periph- 
eral autonomic systems are minimally or in- 
consistently correlated across subjects. One 
reason for the state of affairs is individual 
specificity of response. Most subjects seem to 
have a most likely, or most powerful, channel 
of response. One subject may be an electro- 
dermal responder, another subject a heart 
responder, and so on. When one compares all 
GSR responses with all heart rate responses 
the relationship will be attenuated by the in- 
dividual differences in lability of each system. 
For this reason it is unlikely that the same 
peripheral autonomic indicator will be sensi- 
tive to sexual arousal in all persons. Evidence 
for individual specificity of autonomic re- 
sponse during sexual arousal may be found 
in a study by Hain and Linton.” These in- 
vestigators found interindividual differences in 
physiological measures correlating with selí- 
rated reactions to sexual stimuli. Individuals 
tended to be consistent within themselves in 
their physiological patterns of response. 

Another problem is that of habituation. 
When the same type of stimulus is presented 
repeatedly over a lengthy series of trials, 
physiological responses are typically large to 
the first presentation and thereafter diminish 
in intensity. The reaction of the first trial may 
be as much a function of novelty or surprise 
as the nature of the stimulus itself. 

Shifting base lines of response are a prob- 
lem. The magnitude of response in many sys- 
tems is inversely related to the base line from 
which the response began. Results may be 
radically different depending on what kind of 
response or change measure is used. Covari- 
ance techniques and Lacey's Autonomic Labil- 
ity Score (1956) can be used to remove the 


21 Unpublished paper by J. D. Hain and P. H. 
Linton entitled “Physiological correlates and pre- 
dictors of cognitive emotional response." 
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influence of the base-line measure from the 
response measure. 

Stimulus-response specificity can only be 
assessed by comparing the responses to more 
than one type of stimulus. This involves more 
than comparing sexual stimuli to blank slides 
or neutral stimuli. The question posed by 
Kinsey et al. (1953) in their review of au- 
tonomic findings is, Are there any autonomic 
reactions that can distinguish sexual arousal 
from other states of arousal such as fear and 
anger? Stimuli calculated to arouse emotional 
states other than sexual should be included in 
studies of this problem. Most of the studies 
using negative stimuli in addition to sexual 
stimuli have not found differences in re- 
sponse magnitude (Bernick & Kling, 1967; 
Koegler & Kline, 1965; Levi, 1967; Lifshitz, 
1966; Peavler & McLaughlin, 1967; Romano, 
1969), Even if only sexual stimuli were used 
it would be helpful to have verbal reports on 
other possible reactions to such stimuli. 

The effects of the general experimental situ- 
ations on the subjects have not been con- 
sidered in most experiments. The experiments 
that considered these set factors (Barclay, 

1970; Chapman et al., 1969; Martin, 1964) 
have found that even physiological responses, 
GSR, pupil size, and urinary acid phosphatase 
secretion, may be influenced by set induced 
by instructions, or the characteristics and be- 
havior of the experimenter, Until recent times 
sexual response has been considered a semi- 
private matter beyond the realm of scientific 
study. Confronted with prying experimenters 
attaching electrodes, penile plethysmographs, 
vaginal devices, and showing Pornographic 
stimuli, many subjects might be inclined to 
inhibit voluntary response. Such inhibition can 
also have consequences for physiological re- 
sponses. A Playboy cartoon shows a naked 
man and woman all wired up and under the 
eye of the researchers’ television camera. The 
man plaintively says: “I just don’t feel like 
it.” Failure to consider the human qualities 
of subjects can often lead to erroneous con- 
clusions in psychological experiments. 

Certainly, GSR has been the most favored 
psychophysiological toy of psychologists. In 
most of the experiments, the nude adult fe- 
male figure has proven to be a powerful stimu- 
lus for GSR. But the amplitude of GSRs did 
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not reflect the favored sexual object in several 
studies. Since sweaty palms are not specifically 
involved in the adaptive sexual reaction it is 
clear that the GSR may reflect the novelty of 
nude stimuli, or even negative reactions, as 
much as sexual arousal. This may be par- 
ticularly true of the reaction of women to 
male nudes, since Kinsey (1953) reported 
that most women Say they are not aroused 
by the mere sight of the nude male body. 5 

Although heart and breathing rates are re- 
markably accelerated during actual coitus, 
they do not seem to be very sensitive to the © 
milder arousal produced by psychological 
stimuli; erotic movies affect some increase. 
Blood pressure was the only cardiovascular 
measure which showed a graded responsive- 
ness to erotic stimuli. 

Hess's studies generated a flurry of interest 
in pupil size as an index of positive arousal 
particular. 
to support 
his hypotheses about the stimulus specificity 
of the responses of males and females, and), 
heterosexuals and homosexuals. As with GSR, ! 
the nude figure does elicit larger responses 
than clothed figures, but the sex of the nude 
In fact, there is 
a tendency in severa] studies for the male 
subjects to show more dilation in response 
to male nudes than to female nudes, Rather 


GSR the pupil is a labile System, but unlike 


A - The extensive series S 
of studies by Freund has Shown good dis- ji 
crimination of homosexuals, heterosexuals, and a 
pedophiliacs, Comparable devices for measur- 
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ing female sexual arousal are still in develop- 
mental stages. The development of the vaginal 
blood flow measure mounted on a diaphragm 
ring (Shapiro & Cohen, 1968) and the Fisher 
and Davis modification of Tart's device seem 
to be promising methods. 

In the biochemical area there has been a 
sad neglect of the sex hormones as possible 
indicators of sexual arousal. Levi has shown 
that adrenaline and noradrenaline show re- 
sponses to erotic films, but they are elevated 
by other types of films as well. A suggestive 
finding by Gustafson et al. (1963) using 
urinary acid phosphatase has been followed 
up in a series of studies by Barclay. This 
measure of prostatic secretion might offer a 
more specific biochemical index of arousal in 
males. 

The potential applications of some of the 
methods being developed are already ap- 
parent in the diagnosis and treatment of 
sexual deviants. Many theoretical questions 
concerning human sexual behavior await the 
development of objective and quantitative 
methodology. This methodology is evolving 
and if research continues to grow in this field, 
one day Gebhard and Masters may be able 
to take auto trips together in the security of 
knowing that what they pioneered will go on 
without them. 
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s of meaningfulness to measure semantic 


satiation, and in other ways have attempted to test the effects of semantic satiation, 
are critically evaluated. It is concluded that the effects of the phenomenon labeled 
as semantic satiation have not been reliably measured and are in doubt. An attempt 
is made to link semantic satiation to what has been called the verbal transformation 
effect, and an alternative approach to the study of semantic satiation is suggested. 


Since Amster’s (1964) favorable review of 
the results of semantic satiation studies, new 
studies have appeared which cast doubt on the 
validity of the measurement of semantic satia- 
tion. The methods which have been used to 
measure the phenomenon have become open to 
question, necessitating a reevaluation of the 
entire field. Such a review is attempted in this 
study. 

Semantic satiation has been defined as the 
loss or decrement of the meaning of a stimulus 
word following either (a) (overt) verbal repeti- 
tion, (5) prolonged visual inspection, or (c) re- 
peated writing of the stimulus word. Attempts 
at measurement of the phenomenon have in- 
cluded the use of subjective report, the 
commonality of associates to the test word, 
Osgood’s semantic differential, number of 
associations elicited by the test word, decision 
latency, and word-search time. The effects of 
semantic satiation on verbal learning and prob- 
lem solving have also been investigated. 


SUBJECTIVE REPORT 


In 1907, Severance and Washburn wrote 
that six subjects, who had been instructed to 
look fixedly at several words for 3 minutes per 
word and to report all changes in the words 
during this period, reported perceptual changes 
and a loss of meaning of the words. Focusing 
on the reported “loss of meaning" in the 
Severance and Washburn study, Bassett and 
Warne (1919) were the first to use verbal repe- 
tition as the satiation treatment. Their two 
subjects were instructed to "repeat the word 


! Requests for reprints should be sent to Leroy H. 
Pelton, Department of Psychology, State University of 
New York, 1400 Washington Avenue, Albany, New 
York 12203. 
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aloud until it had lost its meaning." They and 
other investigators (Don & Weld, 1924; 
Fillenbaum, 1963a ; Wertheimer, 1958; Wert- 
heimer & Gillis, 1958) also used subjective re- 
port as the dependent variable, 

Miller (1963) found that when people pushed 
a drawer while repeating Pusu, they took sig- 
nificantly longer to report loss of meaning than 
subjects in a no-action control group. Lifting a 
window while repeating LrFT yielded results 
which were in the same direction, but not sig- - 


nificant. In addition, Miller reported that ifa 


subject pulled the drawer while repeating PUSH, 
satiation times were also significantly longer 
than for the control group, even though they 
were still significantly shorter than for the 
group who pushed the drawer while repeating 
PUSH. In a second experiment, visual stimula- 
tion, that is, looking versus not looking while 
pushing the drawer, was found to have a sig- 
nificant effect, but only for pusi. Since only 
two words were used in the periment, and 
since the results were different for the two 
words, one cannot rule out a confounding 
between word-specific effects and conditions. 

these results are very tenuous at 


Therefore, 
best. 
Though the subjective report studies have 
been consistent with each other in that they 
have generally been interpreted as providing 
positive evidence for the semantic satiation 
phenomenon, one must also note severe criti- 
cisms of these studies. In each of the earlier 
studies (Bassett & Warne, 1919: Don & Weld, 
1924; Severance & Washburn, 1907) the re- 
sults may have been influenced by the small 
number of subjects (six or fewer in each study). 
In all the cited Studies, except that by Sever- 
ance and Washburn, subjects were told in the 
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instructions to expect a. loss of meaning of the 
test. words as a result of the visual fixation 
and/or verbal repetition. These instructions 
may have influenced the subjects to label any 
experienced change as a "loss of meaning." 
Only in the Severance and. Washburn study 
did some subjects spontaneously" label any 
experienced changes as a loss of meaning. 


COMMONALITY OF ASSOCIATES 


Smith and Raygor (1956) first used the 
commonality of associates as a dependent vari- 
able. After the satiation treatment, subjects 
gave one association, the first that came to 
mind, to the test word. Subjects in the control 
condition were shown the test word for 40 milli- 
seconds and asked to respond as above. The 
assumptions were that less common associates, 
defined according to the Kent-Rosanoff (1910) 
frequency table, imply greater satiation effects 
and that meaning and associative power are 
somehow related. Subjects gave significantly 
fewer common associates to test words involved 
in the satiation treatment compared to test 
words which were not. The experimenters con- 
cluded that satiation had occurred and was 
measurable by this method. 

One of the dependent variables used by Paul 
(1962) was similar to that of Smith and Ray- 
gor. Hypothesizing that inhibition due to 
repetition of a word would generalize to associ- 
ates, Paul had subjects associate to a word for 
1 minute, repeat for 30 seconds a word whose 
relationship to the first word was one of either 
identity, high association, low association, or 
unrelatedness, and then associate to the first 
word again. Corresponding control groups 
underwent the same treatment except that the 
30 seconds of repetition were replaced by a 30- 
second pause between the first and second 
association test. Each subjects first emitted 
association was assigned a number equal to the 
frequency with which the association occurred 
to the appropriate stimulus word in the Minne- 
sota norms (Russell & Jenkins, 1954). Paul 
concluded that there was a weak inhibition 
effect due to repetition such that the first 
"associations in the control condition were 
more popular (of higher frequency) than in the 
experimental condition [p. 165]" even though 
the probability of his F ratio was between .05 
and .10. 
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One must not rule out possible explanations 
involving some general effect due to repetition 
which does not involve meaning. For both sets 
of words used in the Paul study, the log fre- 
quencies of the associates observed in the unre- 
lated experimental condition showed decreases 
equal to or greater than the decreases for the 
log frequencies in the identity experimental 
condition when compared to their respective 
control conditions—a result which the satia- 
tion hypothesis would not predict. 

Additional studies by Wolfensberger (1963) 
using Tresselt’s (1958) frequency tables, by 
Baras (1968) using Palermo and Jenkin’s 
(1964) frequency tables, and by Cramer (1968) 
using the Palermo and Jenkins (1964) female 
norms, found no evidence for satiation effects 
using the commonality of associates measure. 

Investigating the effects of repetition of re- 
sponse members as a function of the relation- 
ship between the stimulus and response of 
paired associates, Goldman, Costanzo, and 
Lehrke (1968) had subjects learn paired associ- 
ates which were either common associates, 
semantic space associates, or nonassociates. 
The subjects then repeated the response mem- 
ber and were immediately afterward asked to 
recall its corresponding stimulus member. The 
common-associate and semantic-space-associ- 
ate conditions yielded fewer errors which the 
authors interpreted as evidence for their 
hypothesis that the stronger the preexperimen- 
tal association between a word and its asso- 
ciate, the more difficult for inhibition (through 
repetition) to occur. But, in the absence of no- 
repetition control groups, one cannot be sure 
that this result is not simply due to the 
relationships between the paired associates (see 
Esposito & Pelton, 1969) since the two factors 
(repetition and relationship) are confounded in 
this experiment and since, on the basis of 
results from backward association studies (see 
Ekstrand, 1966, for a review), one might expect 
exactly the same results because of the different 
stimulus-response relationships with no inter- 
vening satiation treatment. Thus, working 
from the commonness of the associate back to 
the stimulus, there is no unequivocal evidence 
for a difference due to verbal repetition. 

If for the above studies one were willing to 
assume (a) that there is no fundamental differ- 
ence between satiation treatments, and (b) 
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that the different frequency tables were 


appropriate for the respective groups, then the 
results of Smith and Raygor (1956), Paul 
(1962), and Goldman et al. (1968) are contra- 
dictory to the results of Wolfensberger (1963), 
Cramer (1968), and Baras (1968). Further. 
more, the results of Paul (1962) and Goldman 
et al. (1968) have very plausible alternative 
explanations. 

Fillenbaum (1963b) conducted five experi- 
ments differing from cach other only in the 
duration of the satiation treatment (written 
repetition of the stimulus), in which subjects 
repeated cither a test word, a Synonym of the 
test word, or a word considered to be semanti- 
cally unrelated to the test word. After the 
satiation treatment, each subject was in- 
structed to give his first association to the test 
word. The dependent variable was the com- 
monality of the given associate as inferred from 
the rank order of the Minnesota norms 
(Russell & Jenkins, 1954) for the Kent- 
Rosanoff words. It was found that the syn- 
onym-satiated items generally yielded the 
least common associates. This study has been 
reviewed and criticized by Esposito and 
Pelton (1969). 

Esposito and Pelton (1969) claimed that it 
is possible that Fillenbaum's dependent vari- 
able did not measure semantic satiation and 
that the synonym-satiated condition may have 
yielded less common associates due to a 
priming effect (see, €.g., Cramer, 1964; Howes 
& Osgood, 1954). That is, the probability of 
occurrence of less common associates to the 
test word may have been increased because of 
the verbal context of the test word. To test 
this possibility, Esposito and Pelton (1969, 
Experiment I) replicated Fillenbaum's (1963b) 
conditions as closely as possible, but excluded 
any satiation treatment. Their results were the 
same as Fillenbaum's, namely, the synonym 
condition yielded the least common responses, 
and they concluded that Fillenbaum did not 
measure any effects due to repetition. 

Gumenik and Spencer’s (1965) results are 
also consistent with this priming interpretation 
of Fillenbaum’s (1963b) results. They argued 
from their results that repetition of a synonym 
of the test word sets the subject to respond to 
the test word in terms of the meaning which 
the synonym shares with the test word, thereby 
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changing the meaning of the test word, Words 
unrelated in meaning to the lest words, how- 
ever, cannot set the subject toward any par- 
ticular meaning of the test words. Gumenik — 
and Spencer did not employ a no-satiation 
control group. 

The priming interpretation and the set 
hypothesis are similar in that both depend only 
upon the semantic relations between the words, 

A priming interpretation could also explain Í 
Paul's (1962) results from his associative mea- 
sure if his reported F ratio Was to be considered 
significant. In the associate conditions subjects 
in the experimental groups repeated a word 
until immediately prior to the second associa- 
tion measure. The repeated word (an associate 
of the test word) might have provided a con- 
text for the test word, thus priming less 
frequently occurring associates to the test 
Word. Any context or priming effect might 
have been sufficiently eliminated in the control 
conditions because subjects only momentarily 
Saw the associate of the test word, and that 
momentary viewing was followed by a 3 z, 
second interval, As shown by Esposito E. 
Pelton (1969), the priming effect is greater 
when the Priming or context word is a synonym 
of the test word, and associates used by Paul 
might be considered as (associative) synonyms 
of the test words. Paul's data did show the 
greatest differences between the experimental 
and control conditions to be for the words 
satiated on associates. Results from another 
of Paul's measures, latency of the subject’s 
most popular response, could also be explained 
in this way, 

It must be noted that Cramer’s (1968) re- 
sults from her normative response rank mea- 
sure showed no evidence that prior presenta- 
tion of a Synonym yields Jess common 
associates than prior presentation of the test 
word itself, in contrast to the results of Fillen- 
baum (1963b), Gumenik and Spencer (1965), 
and Esposito and Pelton ( 1969). Esposito and 
Pelton (1969) hypothesized that this might be 
accounted for if Cramer used synonyms which 
were highly related to the test words, unlike 


With this method of measurement, only one 
1956) yielded signifi- 
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cant results which could be interpreted as posi- 
tive evidence for semantic satiation; Paul's 
(1962) study yielded results which he con- 
sidered significant, but which failed to reach 
the .05 level of significance. Fillenbaum's 
(1963b) results, which were orginally attrib- 
uted to satiation, were replicated by Esposito 
and Pelton (1969) without the use of any satia- 
tion treatment at all. Esposito and Pelton 
accounted for their results in terms of the 
semantic relationships among the words used. 
The results of Paul (1962), Gumenik and 
Spencer (1965), and Goldman et al. (1968) 
could also be explained in this way. Wolfens- 
berger (1963), Cramer (1968), and Baras 
(1968) found no evidence for the satiation 
hypothesis using the commonality of associates 
measure. 


SEMANTIC DIFFERENTIAL 


Operationally defining semantic satiation as 
a decrease in the polarity of ratings of words on 
Osgood's semantic differential (see Osgood, 
Suci, & Tannenbaum, 1957), Lambert and 
Jakobovits (1960) were the first to employ this 
measure in a semantic satiation study. They 
had subjects rate five words on nine scales 
before the satiation treatment and then indi- 
vidually rate each of the words on each of the 
scales following 15 seconds of overt, verbal 
repetition at a rate of 2-3 repetitions per sec- 
ond. That is, one word was repeated aloud and 
then rated on one scale, another word was re- 
peated and then rated on another scale, etc., 
until all five words were rated on all nine scales. 
This group was compared with a number of 
control groups. The difference from the pre- 
to the postsatiation ratings for the satiation 
group was significant, the average change being 
toward the middle point of the semantic differ- 
ential scales. None of the other groups showed 
a significant change in ratings. 

Floyd (1962) attempted to replicate Lam- 
bert and Jakobovits’ (1960) results with a 
slightly modified experimental design. Though 
he found a significant decrease in semantic 
differential ratings from the pre- to the post- 
repetition ratings for the six repeated words, he 
also found no difference between the postrepe- 
tition ratings for the repeated words and the 
ratings of six nonrepeated control words. How- 
ever, it must also be noted that the repeated 
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and nonrepeated words were different so that 
the possibility of word-specific effects exists. 

Using essentially the same method as that 
used by Lambert and Jakobovits (1960), Yelen 
and Schulz (1963) failed to find any effects of 
the satiation treatment. In one experiment 
Yelen and Schulz found that some semantic 
differential scales seemed to yield consistent 
satiation effects while other scales seemed to 
yield consistent generation effects. 

Yelen and Schulz explained their findings in 
the following way. The “satiation scales," as 
shown by their results, are those which initially 
have ratings at the extremes of the scale; the 
"generation scales" are those which have initial 
ratings closer to the middle point. Therefore, 


the present results are most simply explained as a re- 
gression-like phenomenon whereby repetition in some 
way interferes with S’s [subject’s] recall of his initial 
ratings or disposes S to believe that he is to change his 
initial ratings following repetition. Hence, if S's initial 
rating was relatively intense, then there is a greater 
possibility that his second rating will be less so. Analo- 
gously, if his initial rating is relatively neutral, the like- 
lihood is increased that his second rating will be more 
intense [p. 377]. 


Further, they wrote, 


it is of interest that 16 of 30 Ss in the repetition condi- 
tions in Exp. IV, when asked what they thought the 
purpose of the experiment was, indicated in one way or 
another that change in ratings as a result of repetition 
was the purpose. Under control conditions, only 4 Ss 
thought change in ratings as a function of time was the 
purpose [p. 377]. 


Amster (1964) reported that her analysis of 
results obtained by Messer, Jakobovits, Ka- 
nungo, and Lambert (1964; also see later dis- 
cussion) showed a + .42 rank-order correlation 
between the magnitude of satiation effect and 
initial word rating. This correlation was only 
found in experimental satiation groups, and not 
for various control groups. Such a correlation 
might be interpreted as evidence for Yelen and 
Schulz's regression hypothesis. Amster stated, 
however, that though the regression hypothesis 
may account for some variance of ratings, it 
does not cast doubt upon the existence of the 
phenomenon. She tried to reconcile the differ- 
ences between the results of Lambert and 
Jakobovits (1960) and Yelen and Schulz 
(1963) by noting that Lambert and Jakobovits 
used a polarity difference measure while Yelen 
and Schulz used a mean difference score and 
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that it is possible to account for discrepancies 
in this way. As empirical evidence, an unpub- 
lished 1963 study by Hodge and Battig was 
cited by Amster showing that the different 
methods of scoring the same data could lead to 
different results. But, regardless of the method 
used, the Hodge and Battig results were nol 
statistically significant; that is, there was no 
statistical. evidence for semantic satiation 
whether the polarity difference or mean differ- 
ence scores were used. Yet Amster was willing 
to conclude: “Despite the lack of statistical 
significance, these results seem inconsistent 
with those of Yelen and Schulz (1963) and 
consistent with those of Lambert and Jako- 
bovits (1960) [p. 227].” 

a Weaver, and Radtke (1965) con- 
ducted a study using the same words, scales, 
and procedures that Yelen and Schulz (1963, 
Experiment IV) found to produce satiation 
and generation effects, but employed a "post- 
test only" design so that the subjects did not 
make presatiation ratings. This eliminated the 
possibility of any regression effects. Analyzing 
both mean ratings and mean polarity Scores, 
the authors found no differences between the 
repetition and no-repetition conditions either 
on satiation or on generation scales. These re- 
sults are consistent with Yelen and Schulz’s 
regression hypothesis implying that semantic 
differential ratings are influenced by other- 
than-semantic factors which may account for 
results which have been attributed to semantic 
satiation. 

In a subsequent article, Jakobovits and 
Lambert (1967) reiterated and expanded the 
criticisms that Amster (1964) made regarding 
Yelen and Schulz’s scoring procedures. In 
addition, Jakobovits and Lambert questioned 
Schulz et al.’s (1965) use of untransformed 
Scores, as well as their use of transformed 
(polarity) scores, namely, whether the trans- 
formation into polarity scores was made for 
each subject's individual ratings (as was done 
by Lambert and Jakobovits, 1960), and the 
use of a posttest-only design. The first two 
issues were raised since, again, it was shown 
that differences in scoring could lead to differ- 

ent conclusions from the same data. The third 
issue was raised because Jakobovits and 
Lambert (1967) contended that there are 
"enormous individual differences in ratings 
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involved when the scores of one group of per- 
sons are used for initial ratings and those of 
another group are used for final ratings [p. 
956]." 

Schulz (1967) answered these criticisms by 
stating that whether one observes differences 
in results due to differences in Scoring proce- 
dures depends partly upon the particular dis- 
tribution of pre- and postrepetition ratings. 
Further, Schulz noted that in Velen and 
Schulz's (1963) Experiments T through IV, 
verbal repetition yielded no consistent results 
even though the polarity difference Scores were 
used. Regarding untransformed Scores, Schulz 
stated that they never advocated untrans- 
formed instead of transformed scores; rather, 
untransformed scores were used in addition to 
transformed scores. Schulz further explained 
the reason for their use, and presented some 
data from Yelen and Schulz (1963, Experiment 
IV) which supported the regression hypothesis. 
In reply to Jakobovits and Lambert’s question- 
ing of the transformation of scores, Schulz 
flatly stated, “it is insinuated that the ratings 
might have been improperly transformed to 
polarity scores. This was not the case [ p. 9597." 
Finally, concerning the issue of individual dif- 
ferences, Schulz noted from the evidence pre- 
sented in Table 2 of the Schulz et al. (1965) 
study that the distributions of initial ratings 
obtained from two independent samples of 
subjects representing the same population 
were very comparable, and that the standard 
deviations of the repetition and control groups 
were .45 and 46, respectively, thus providing 
little evidence for “enormous individual differ- 
ences.” “Hence, unless Jakobovits and Lam- 
bert are Proposing that the theory of random 
sampling be repudiated, it is difficult to com- 
prehend their objections to the posttest-only 
design [p. 959].” In a second posttest-only 
study, Shima (1966) also found no indication 
of satiation. 

Approaching the problem of the regression 
hypothesis in a slightly different way, Jakobo- 
vits and Rice? tested, among other variables, 
the effect of initial Polarity on the direction 
and amount of change in semantic differential 

* Jakobovits, L. A., and Rice, 
tion as a function of initial polarity and scale relevance. 
A later version ofa paper presented at the meeting of 
Tus PARE Psychological Association, Boston, April 


; U. M. Semantic satia- 
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ratings. They reported that subjects displayed 
no differences in the direction and amount of 
change in ratings, regardless of the initial 
polarity of the stimulus. But in their experi- 
mental design, Jakobovits and Rice had to use 
different semantic differential scales for each 
subject, thereby introducing an uncontrolled 
factor; and they also confounded the variable 
of relevance (i.e., they attempted to measure 
the relevance of a semantic differential scale 
to the word being rated on that scale) with the 
factor of initial level of polarity since all levels 
of polarity were not tested in combination with 
all levels of relevance. Consequently, one 
cannot determine what effects, if any, resulted 
from varying initial polarity only or from 
varying relevance only, Therefore, the ex- 
perimenters’ conclusions that initial seman- 
tic ratings had no effect but that there was a 
significant regression effect due to relevance are 
vitiated. A study to experimentally separate 
polarity and relevance might be useful to 
determine (a) the relationship, if any, between 
polarity and relevance; and (b) the relation- 
ship, if any, between Yelen and Schulz's 
satiation and generation scales and relevance. 
The latter relationship is important if, as some 
people have written (e.g, Osgood & Suci, 
1955; Osgood et al., 1957; Weinrich, 1958), in 
some cases a middle rating on a semantic 
differential scale is due to the irrelevance of 
the scale to the concept being rated. Hence, 
polarity and relevance may not be orthogonal 
factors implying that regression effects due to 
relevance may have the same significance as 
regression effects due to initial polarity. 

Because of (a) the confounding of polarity 
and relevance, (b) the possible relationship 
between polarity and relevance, and (c) the 
regression effect for relevance, the results of 
Jakobovits and Rice (see Footnote 2) do not 
contradict the previous findings with regard to 
regression effects. In particular, Yelen and 
Schulz’s regression hypothesis still remains a 
plausible explanation for results obtained with 
the use of the semantic differential. (One might 
also note that Jakobovits and Rice’s study is 
not a good demonstration of the semantic 
satiation phenomenon since no control groups 
were used.) 

In a subsequent study, Kasschau (1969) 
tested three levels of initial meaning intensity 


(2.5, 1.5, and 0.5 units away from the midpoint 
of semantic differential scales), three semantic 
differential factors (Evaluative, Potency, and 
Activity), and seven levels of repetition dura- 
tion (0, 5, 10, 15, 30, 60, and 120 seconds) ina 
pre- and postrepetition rating design. Both 
mean difference scores and polarity difference 
scores were computed. For both scoring pro- 
cedures, significant effects were found for the 
three main factors and the interaction between 
the semantic differential factors and initial 
meaning intensity, as well as a three-factor 
interaction for mean difference scores. It 
scems as if, in general, the regression hypothe- 
sis could explain the results for the mean differ- 
ence scores. Two questions which arise, 
particularly for the mean difference score data, 
are why the 0-second repetition conditions 
yielded satiation effects, and which repetition 
duration conditions differed significantly from 
the 0-second repetition conditions which might 
be construed as control conditions. For the 
polarity difference scores, the 1.5 initial-mean- 
ing intensity condition showed less satiation 
than the 2.5 condition at every level of repeti- 
tion duration which is in accordance with the 
regression hypothesis. However, the 0.5 initial- 
meaning intensity condition did not seem to 
differ from the 2.5 condition which is incon- 
sistent with the regression hypothesis. If one 
disregards the regression hypothesis, there 
still remains a result which is difficult to ex- 
plain, namely, why is the 1.5 initial-meaning 
intensity condition consistently below the 2.5 
and 0.5 conditions? One must then further 
ask what, if any, subtle differences exist 
between the pre- and postrepetition ratings 
design and the postrepetition-rating-only de- 
sign such that the studies employing the latter 
design consistently report no significant satia- 
tion effects (a fact which was further confirmed 
in our own laboratory in an unpublished 
study). Perhaps only further research will 
clarify these points. 

Other semantic differential studies have pro- 
duced contradictory or negative results with 
regard to the satiation hypothesis. Studying 
the effects of verbal repetition and type of bi- 
lingualism, Jakobovits and Lambert (1961) 
found a satiation effect for one of their satia- 
tion groups; but they also found a generation 
effect for a second satiation group and a “‘satia- 
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tion" effect for a control group—contrary to 
expectations. Reynierse and Barch (1963) 
found no satiation effects, and Schwartz and 
Novick (1964) concluded from their study that 
the satiation treatment may have nonspecific, 
nonsemantic effects. Using both numbers and 
words for stimuli, Messer et al. (1964) found 
no satiation effects for numbers on the standard 
semantic differential scales, but did find an 
effect on a special meaningful-meaningless 
scale. For words, the satiation group showed a 
satiation effect on the standard semantic 
differential scales, but, unexpectedly, so did 
one of their control groups. No satiation effect 
was found for words on their special scale, 
Madigan and Paivio (1967) obtained satiation 
and generation effects following verbal repeti- 
tion depending upon the type of instructions 
they gave their subjects with respect to the 
semantic differential. No evidence of satiation 
was found by Hupka and Goss (1969), All of 
the above studies used pre- and postrepetition 
ratings designs, and their results are not 
necessarily inconsistent with the regression 
hypothesis in view of the fact that for some 
studies the semantic differential scales used 
were not published, thus precluding a conclu- 
sion with regard to the regression hypothesis. 
Of all the experiments reviewed in this sec- 
tion, no single one takes into account all possi- 
ble factors (e.g., relevance of scales, type of 
scales, regression effects, and meaningfulness 
of stimuli) and includes all appropriate control 
conditions. Though Lambert and Jakobovits 
(1960) reported positive evidence for semantic 
satiation using the semantic differential, Rey- 
nierse and Barch (1963), Yelen and Schulz 
(1963), Schulz et al. (1965), Shima (1966), and 
Hupka and Goss (1969) did not. In other 
studies, Jakobovits and Lambert (1961) and 
Messer et al. (1964) found satiation effects for 
some experimental groups but not for others, 
and also found unpredicted “satiation” effects 
for some of their control groups. Floyd's (1962) 
results were equivocal. Schwartz and Novick 
(1964) presented some evidence for a non- 
specific effect of repetition, and Madigan and 
Paivio (1967) provided some evidence that 
semantic differential instructions may affect 
ratings. Yelen and Schulz (1963) provided an 
alternative interpretation of Lambert and 
Jakobovits’ (1960) positive results stating that 
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they might ultimately be due to the subjects’ 
perception of the experimental situation and a 
concomitant regression effect, fand provided 
some evidence for that interpretation. The re- 
gression hypothesis might also be able to 
account for the significant differences found 
for some different-word control groups (i.e., 
groups in which subjects rate Word A, repeat 
unrelated Word B, and then rate Word A 
again; e.g., see Jakobovits & Lambert, 1961; 
Messer et al., 1964) if, in some way, subjects 
in these groups also perceived the "purpose" of 
the experiment as being the investigation of 
changes in pre- and postrepetition ratings due 
to intervening repetition of any word. The re- 
sults of Jakobovits and Rice (see Footnote 2) 
were contrary to the Yelen and Schulz’s hy- 
pothesis, but at least two of Jakobovits and 
Rice’s variables were confounded, and this 
confounding could also account for their re- 
sults. The results of Kasschau (1969) further 
confused the issue since part of his results were 
consistent with the regression hypothesis, and 
part were not. Until such time as positive 
results can be consistently and unequivocally 
obtained, controlling all appropriate factors, 
the semantic differential method of measuring 
semantic satiation is extremely suspect. 


NUMBER or ELICITED ASSOCIATIONS 


Defining meaning as an acquired response or 
set of responses to a stimulus (after Noble, 
1952, and others), Kanungo? and his associ- 
ates (Kanungo & Lambert, 1963a, 1963b, 
1964; Kanungo, Lambert, & Mauer, 1962) 
viewed semantic satiation as a temporary 
extinction phenomenon in which the response 
to or meaning of a verbal stimulus is tempo- 
rarily extinguished due to satiation treatment. 
Kanungo and Lambert (1963b) first used 
Noble's m as a measure of meaningfulness in a 
semantic satiation study in order to operation- 
alize their view of semantic satiation (m is the 
average number of relevant 
to a stimulus within 60 seconds; see Noble, 
1952). In their first experimental condition the 


Same group of subjects associated to the test 
words before the satiation 


associations given 


treatment (2-3 


Kanungo, R. N. Semantic satiation and verbal 
learning. Paper presented at the 74th Annual Conven- 


tion of the American Psychological Association, New 
York, September 1966, " 
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repetitions per second for 20 seconds with 
visual fixation), and after the satiation treat- 
ment. Subjects in a control condition similarly 
associated to the test words twice, but with no 
intervening treatment. A second experimental 
condition was run in which one group of sub- 
jects associated to the test words without the 
satiation treatment, and a second group of 
subjects associated to the test words following 
satiation treatment on them. For the first 
experimental condition and the control condi- 
tion, contrary to expectations, there was a 
significant increase in the number of relevant 
responses, and therefore in the meaning of the 
words as measured by this method, from the 
first to the second test. For the second experi- 
mental condition a significant decrease in the 
number of relevant associations and a signifi- 
cant increase in the number of irrelevant (i.e., 
clang, perseverative, free, and illegible) as- 
sociations was observed from the pre- to the 
postrepetition test. The two significant dif- 
ferences for the second experimental condition 
were interpreted by Kanungo and Lambert as 
evidence of satiation, while the differences in 
the control and first experimental conditions 
were attributed to a memory effect which in 
the case of the first experimental condition 
was assumed to be greater than the presumed 
satiation effect. No different-word satiation 
group was employed to control for possible, 
diffuse, nonsemantic effects of repetition. 
There were no indications or examples of the 
types of responses considered to meet Noble’s 
vague criteria of irrelevance. Paul (1962, p. 
163), concerning the use of total number of 
associations, had stated : * Although some asso- 
ciations were highly personal and others 
seemed determined partly by inappropriate 
response chaining, they were still included in 
the subject’s score because no satisfactory ob- 
jective criterion for eliminating such responses 
could be found." 'The present authors have 
had similar experiences using Noble’s m in their 
own research. They had two raters independ- 
ently judge associations for irrelevance, but 
the interrater agreement was generally ex- 
tremely low, indicating that the criteria of 
irrelevance are idiosyncratic to each judge. 
Another consideration is whether, 


for instance, 
an increase in the number of illegible responses 
is necessarily an. indication of satiation or a 
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result of some other factor, such as fatigue. 
Similar considerations have to be taken into 
account for the other criteria of irrelevance. 
One should, therefore, be aware of the fact 
that there was no difference in the total num- 
ber of elicited associations between the pre- 
and the postrepetition associative test for 
Kanungo and Lambert’s (1963b) second ex- 
perimental condition, and if one were to con- 
sider only the total number of responses, the 
two significant differences for the second ex- 
perimental condition would disappear. 

In other studies, Paul (1962) and Shima 
(1966) failed to find any satiation effects using 
total number of associations elicited within 
60 seconds. Similarly, Cramer (1968) and 
Baras (1968) using the total number of associ- 
ations elicited in 30 seconds and 15 seconds, 
respectively, also failed to find satiation 
effects. However, neither Paul, Shima, Cramer, 
nor Baras distinguished between relevant and 
irrelevant responses (i.c. they did not use 
Noble’s m). 

Until more stringent criteria are devised for 
the classification of associations as either rele- 
vant or irrelevant, this method does not seem 
to be reliable for measuring semantic satiation. 


DECISION LATENCY 


Jakobovits and Lambert (1962b) introduced 
the use of decision latency as a measure of 
semantic satiation with the rationale that 
tasks involving less meaningful symbols would 
take longer when the meaning of the symbol is 
required for solution. They gave each of their 
subjects pairs of numbers to add. Half of the 
pairs were preceded by 15 seconds of verbal 
repetition of one of the addends; the other half 
of the pairs were preceded by the repetition of 
a number different from either addend. Then 
the first addend was exposed for half a second 
followed by a half-second exposure of the sec- 
ond addend. Each subject responded as soon as 
possible after the second addend appeared by 
pressing the keys (indicating the integers 0 to 
9), one at a time, which corresponded to this 
answer. Jakobovits and Lambert then sub- 
tracted the control latencies (e.g., the latency 
from repeating 1 and adding 7 + 7) from their 
corresponding experimental latencies (e.g., 
repeating 7 and adding 7 + 7) to find an aver- 
age significant positive difference of .063 
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seconds, indicating to them that satiation had 
| occurred. 

It may be noted that a no-satiation control 
group was not included in this study and that 
the numbers preceding the addends for the 
control condition were always either 1, 2, or 3 
while the satiator numbers preceding the 
addends in the experimental condition were 
always either 7, 8, or 9. Perhaps there could be 
an effect attributable to the magnitude of the 
satiator number. For instance, if subjects have 
a first impulse to add all three numbers, re- 
gardless of the instructions, then it is not un- 
tenable that decision latency would be affected 
by the magnitude of the satiator number since, 
in general, larger numbers take longer to add. 
Though more formal experimentation is neces- 

sary on this point, informal observation by one 
of the present writers does find a tendency of 
subjects to add all three numbers. 

Messer et al. (1964) found no differences 
between the initial and the final latencies of 
semantic ratings for numbers, indicating that 
their satiation group did not differ from their 
control groups with regard to response laten- 
cies. Neither did they find evidence for number 
satiation from the semantic ratings. 

Using decision latency as a dependent vari- 
able to extend Jakobovits and Lambert's 
(1962b) rationale to words, Fillenbaum (1964) 
had subjects repeat a word for 1 minute (or, in 
one experiment, until the reported loss of 
meaning) and then decide whether a pair of 
words were synonymous. The relationship 
between the satiator word and one of the 
words of the pair was one of either identity, 
unrelatedness, or Synonymity. The relation- 
ship between the members of the pairs of 
words (the decision pair) was one of either 
close synonymity, far synonymity, or unrelat- 
edness (according to Haagen, 1949), In gene- 
ral, the close synonym pairs yielded the short- 
est decision latencies, followed in order by the 

unrelated pairs and the far synonym pairs. 
For the satiator-decision-pair relationship, the 
shortest latencies were for the identical condi- 
tion, followed by the synonym condition and 
the unrelated condition, This study, and 
Fillenbaum’s explanation of his findings, have 
been reviewed and criticized elsewhere (Espo- 


sito & Pelton, 1969). 
Gough and Rohrman (1965) designed a 
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study in which amount of repetition (1 or 15 
seconds) of the satiator word, decision-pair 
relationship (synonym or unrelated), and 
amount of practice were varied. They found 
that giving subjects one of the words of the 
decision pair as a “satiator” word before they 
decided if the words of the decision pair were 
synonymous (the “forewarning” condition) 
yielded significantly shorter latencies than did 
using a “satiator” word unrelated to the deci- 
sion pair. Practice effects were also significant, 
but repetition had no effect, Gough and 
Rohrman concluded that in Fillenbaum’s 
(1964) study, as in their own, the satiation 
treatment was not effective; forewarning was 
the important variable. The forewarning 
hypothesis was again supported by Rohrman 
and Gough (1967). 

In a similar experiment, Gumenik and Perl- 
mutter (1966), found, in agreement with the 
results of Gough and Rohrman (1965), that 
repetition had no effect on decision latency. 

Since none of these studies (Gough 
Rohrman, 1965; Gumenik & Perlmutter, 1966; 
Rohrman & Gough, 1967) employed all of 
Fillenbaum's (1964) conditions, Esposito and 
Pelton (1969), using Fillenbaum's material 
from Haagen (1949), attempted to reproduce 
Fillenbaum's conditions as closely as possible, 
but without using any satiation treatment, The 
hypothesis was that his results were due to the 
semantic relationships among the words and 
not to semantic satiation. Their results, in 
general, duplicated those of Fillenbaum (1964). 

Gorfein (1967) did find some evidence for 
differences in association speed (or latency) 
following repetition in a study in which the 
commonness of the associations was experi- 
mentally manipulated, Gorfein simultaneously 
varied the popularity of the responses to a 
stimulus (according to Palermo & Jenkins, 
1964) and repetition (15 Seconds) of either the 
Stimulus itself or the most commo. 
of the stimulus. Only in the low. 
stimulus-repetition 
increase in the time 
their associations to 
pared with their pe 
items; the differenc 
tions were not 
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sociative latency as a dependent variable, but 
they did not distinguish between high-popu- 
larity and low-popularity items. 

Only three studies using latency measures 
(Fillenbaum, 1964; Gorfein, 1967 ; Jakobovits & 
Lambert, 1962b) found any results which 
might have been interpreted as evidence for 
semantic satiation. However, Jakobovits and 
Lambert’s (1962b) results might have been due 
to a factor other than repetition, Fillenbaum's 
(1964) results were duplicated without satia- 
lion treatment, and Gorfein's results indicate 
an interaction effect which would challenge the 
generality of the phenomenon at least with the 
use of their measure. Therefore, it would seem 
as if more research is needed to clarify their 
results. 


VERBAL LEARNING AND 
PROBLEM SOLVING 


Many investigators have attempted to assess 
the effects of verbal repetition on words used 
in verbal learning situations (primarily the 
learning of paired associates) and on words 
needed to solve a subsequent problem. The 
writers would first like to point out various 
general problems which could alter the current 
interpretations of the verbal learning studies, 
and then do the same for the problem-solving 
studies. 

Although the investigators of the verbal 
learning studies attempted to assess any 
change in meaning of the relevant words prior 
to the learning situation, most did so by 
means of the semantic differential (Das, 1966; 
Jakobovits & Lambert, 1962a; Kanungo & 
Lambert, 1964; Kanungo & Ross, 1966; 
Kanungo et al., 1962). One must note, however, 
that in no cases were control groups used 
(neither different word nor no-satiation) with 
regard to the semantic differential ratings, and 
that in each case the same subjects did both 
the pre- and the postrepetition ratings so that 
the regression hypothesis cannot be eliminated 
as a possible explanation for the significant 
differences between the pre- and postrepetition 
ratings which were generally, but not always, 
found. A further reason for the consideration 
of the regression hypothesis with respect to 
these ratings is the fact that at least four of the 
cited studies used three semantic differential 
scales, two of which Yelen and Schulz (1963) 
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Íound to be satiation scales. Therefore, these 
semantic differential ratings are open to all the 
criticisms discussed in the semantic differential 
section; they in no way add to the previous 
arguments. 

A second serious consideration of verbal 
learning studies, especially the paired-associate 
studies, is the possible effects which nonseman- 
tic factors not necessarily due to repetition had 
on the subsequent learning tasks. Inhibitory 
effects may, as is known, occur as a result of 
similarities of operations as well as materials 
of interpolated tasks, and similarities in 


material may include such nonsemantic factors 


as similarity of pattern, sound, etc. (see Wood- 
worth & Schlosberg, 1954). Therefore, inhibi- 
tion due to semantic satiation is not the only 
possibility. Yet few of these other possible 
nonsemantic factors have been investigated ; 
experimenters for the most part have assumed 
that any change in the subjects’ performances 
was because of a meaning decrement of the 
stimulus material due to repetition. 

Tn at least one case, similar methods used by 
different experimenters have yielded contra- 
dictory results. Kanungo et al. (1962) had sub- 
jects in an experimental group learn List 1 of 
paired associates. The subjects then rated the 
response members of List 2, repeated them for 
15 seconds, rerated them, and then learned 
List 2. Subjects in the control group learned 
List 1 (for matching purposes), rated, repeated, 
and rerated irrelevant words, and then learned 
List 2. The experimental group was signifi- 
cantly inferior to the control group on the 
number of errors and number of trials to cri- 
terion. However, Pyke, Agnew, and Adams 
(1966), reportedly using exactly the same 
method, found exactly the opposite results, It 
must be noted that Pyke et al. attempted to 
explain this discrepancy for at least one of 
their groups by noting that "there is some sug- 
gestion that had the PA [paired-associates ] 
items been matched for intralist competition, 
repetition might have led to poorer perform- 
ance, at least for the dexedrine drug group 
[p. 104].” 

Other findings which are difficult to explain 
are those of Kanungo and Lambert (1964). 
Following a prerepetition rating of eight 
words, each subject in each of three satiation 
groups repeated Word 1 of the eight words for 
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either 5, 15, or 25 seconds and then rated that 
word on one semantic differential scale. The 
same procedure was followed for Words 2-8 
after which Word 1 was again repeated and 
rated on another semantic differential scale, 
etc., until all words were rated on three seman- 
tic differential scales. Subjects then learned the 
eight words as response members in a paired- 
associates situation. Though there was a sig- 
nificant difference between the experimental 
satiation groups and the control groups, there 
were no differences among the 5-, 15-, and 
25-second repetition groups on the postrepeti- 
tion semantic differential ratings or with regard 
to the number of trials to criterion. Since 
Kanungo and Lambert found a significant 
decrease in semantic ratings following repeti- 
tion, the implication for the satiation hypothe- 
sis is that 5 seconds of repetition is sufficient to 
decrease the meaning of a word as measured on 
one scale, unless one is willing to assume that 
the effects of 5 seconds of repetition summate 
with the effects of another period of 5 seconds 
of repetition even though the repetition periods 
are separated by a minimum of 35 seconds 
needed for the repetition and rating of seven 
other words. Perhaps, as suggested by the 
regression hypothesis, another interpretation is 
that subjects’ perception of the purpose of the 
experiment is important, and 5 seconds of repe- 
tition might be equally effective as 25 seconds 
of repetition in altering the subjects’ views and 
performances accordingly. 

Das (1966) correlated subjects’ combined 
scores from two verbal conditioning tasks with 
what Das called semantic satiation as indexed 
by the semantic differential and found a 
— 484 coefficient. However, when Das sepa- 
rated his subjects into fast learners and slow 
learners on the basis of whether their perform- 
ances in the verbal tasks were above or below 
the median, he found that the fast learners 
group showed a mean polarity difference score 
of + 1.31 from the pre- to the postrepetition 
ratings which, as Das indicated, was evidence 
of semantic generation. Since the fast learners 
group represented half of the subjects provid- 
ing scores for the correlation coefficient, it is 
obvious that Das had correlated not semantic 
satiation (defined as a decrease in polarity 
scores) with verbal conditioning Scores, but 
polarity changes (both positive and negative) 
with verbal conditioning scores. On the basis 
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of these results it is difficult to see how Das 
could conclude as he does only about the 
relationship between satiation (i.e., negative 
polarity changes) and verbal conditioning. 
Parenthetically, it might be noted that this 
same criticism can be made of other studies by 
Das (1964a, 1968), and that in these same 
studies (as well as Das, 1964b) he was willing 
to conclude that there were satiation effects 
even though he obtained no significant statisti- 
cal differences. 

In a free recall situation, Roberts, King, and 
Reid (1967) found inhibition when subjects 
first repeated a word related to an eight-word 
list and were then asked to recall the eight- 
word list in a retroactive inhibition paradigm, 
but they found no such inhibition in a pro- 
active paradigm. 

Cook (1968) claimed that the effectiveness of 
a verbal reinforcer was reduced when third- 
graders repeated the reinforcers prior to a con- 
ditioning session. However, Cook also con- 
cluded that “the satiation effect . . . is not 
dependent on the semantic [italics added] 
characteristics of the satiated word. Thus, the 
effectiveness of a positive reinforcer was altered 
when the satiated word was either positive or 
negative . . . [p. 1085 ]." Further, it was re- 
ported that for a satiation treatment consisting 
of 0, 10, or 80 repetitions, there was no ob- 
served decrement in the effectiveness of the 
repeated word, whereas for 20, 30, and 40 
repetitions there was. This is difficult for the 
semantic satiation hypothesis to explain, 
especially for the 80-repetitions condition. 

Though verbal conditioning studies, espe- 
cially paired-associates studies, may provide 
evidence for semantic satiation, they cannot do 
so convincingly until appropriate control 
groups have been investigated in order to 
eliminate other plausible interpretations of 
these studies. In particular, inhibition by 
means of semantic satiation is only one possible 
alternative; inhibition may also occur as a re- 
sult of similarity of interpolated tasks or 
operations along nonsemantic dimensions. For 
instance, the rating of response items on seman- 
tic differential scales may be psychologically 
similar to the paired-associate task since the 
same words are being used ina slightly different 
task, especially if one takes the view that in 
using the semantic differential one is “associat- 
ing” a word (i.e., the word to be rated) to one 
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of the two bipolar adjectives (e.g., see Bouse- 
field, 1961). That is, one could argue that rating 
FATHER on the Goop-BAD scale is simply noting 
which bipolar adjective the subject would be 
more likely to give as an associate to FATHER. 
Hence, the response of the paired-associate list 
would be a stimulus in the rating condition—a 
stimulus which might be emphasized by means 
of repetition. Such similar operations using the 
same words in different ways, rather than repe- 
tition, might be responsible for any negative 
effects on paired-associates learning. 

Purohit (1965) came to a similar conclusion 
when he had subjects undergo satiation treat- 
ment on eight numbers which were the pro- 
ducts of eight multiplication problems. Sub- 
jects then learned, by the anticipation method, 
four correct and four incorrect products to the 
eight multiplication problems. Using Kanun- 
go's stimulus-response interpretation, Purohit 
hypothesized that compared to a no-satiation 
control group, the satiation of the correct re- 
sponses should have facilitated the learning of 
the incorrect products, but should have hin- 
dered the learning of the correct products. This 
latter prediction was upheld, but there were no 
differences between the satiation and no-satia- 
tion groups regarding the learning of the in- 
correct products. Purohit concluded that 
repetition might induce reactive inhibition 
but “not effect the semantic structure of the 
satiated material." 

The above considerations, coupled with the 
negative evidence (Cook, 1968; Pyke et al, 
1966; the proactive condition of Roberts et al., 
1967) and results difficult to explain by means 
of the semantic satiation hypothesis (Kanungo 
& Lambert, 1964), lead one to be very cautious 
in interpreting these studies as evidence for 
semantic satiation. 

It can also be argued that there are many 
nonsemantic factors which have not been ruled 
out of studies which use performance m à 
problem situation as a measure of semantic 
satiation. Cook and Vachon (1968) concluded 
that following repetition of a key word (e.g., 
STRING), a problem which was to be solved by 
use of an actual piece of string took signifi- 
cantly longer to solve as compared to a control 
group who repeated an irrelevant cue before 
the presentation. of the problem. However, 
Cook and Vachon's analyses show no signifi- 
cant differences among groups differing on 


number of repetitions (1, 15, 30, or 60) of the 
word in the satiation treatment. The implica- 
tion is that 1 repetition leads to as much satia- 
tion as 60 repetitions, which is contrary to the 
satiation hypothesis. One might ask how the 
subjects (fourth-graders) viewed the experi- 
menter's purpose in running the study. 
Usually when teachers present problems to 
their pupils, they do not give the "answer" 
beforehand. "Therefore, children may discount 
a possible solution simply because it was men- 
tioned before the presentation of the problem, 
and this may go against the children's 
expectations. 

In a related study, Jakobovits (1965) pre- 
pared sets of 12 words (e.g., REFLECTION, DIS- 
PERSION, REFRACTION, BEAM, GLEAM, STREAM, 
STEAM, SCH SEEM, PERMISSION, DISCUS- 
SION, and ATTENTION) such that each set could 
be classified into two equal subsets either on 
the basis of semantic similarity or physical 
similarity such as rhyme or word length. Sub- 
jects repeated either one of the words in the set 
or an unrelated word before classifying the set. 
Jakobovits found more physical similarity 
solutions in the experimental group and inter- 
preted this as evidence that the semantic 
mediator needed for the semantic solution was 
satiated and therefore less available. However, 
the same result could be predicted even if the 
repetition were eliminated, or if a word similar 
in sound to some of the set words were re- 
peated. For instance, it might be predicted 
from a set interpretation that subjects would 
use more physical similarity solutions for the 
above word set even if subjects did not repeat 
REFLECTION but simply saw it beforchand or if 
subjects saw, for instance, COLLECTION, a non- 
semantically related rhyme of half of the 
words. Until this possibility is eliminated, the 
results remain tenuous. 

Though both verbal learning studies and 
problem-solving studies may index an effect 
of repetition, it is seriously doubted that these 
effects are semantic in nature. 


SEARCH TIME AND VISUAL 
‘THRESHOLDS 


Gampel (1966) had subjects search for a 
target word in an array of words following 0, 
5, 15, 30, 60, or 120 seconds of repetition of 
either the target word or a high associate of the 


target word with the rationale that as meaning 
decreased, search time should increase. For 
repetition on both types of words, the general 
results were that the shortest search times were 
obtained for a repetition duration of 15 sec- 
onds, the longest for a repetition duration of 
60 seconds. The latencies for the 0-second 
repetition condition fell between those for the 
above conditions. In a second experiment 
Gampel held repetition duration constant at 
60 seconds and varied the time (delay dura- 
tion) between the end of the repetition and the 
beginning of the search. For repetition of both 
types of words, the 5-second delay duration 
yielded the longest latencies, but for the 15- 
second delay duration condition, the latencies 
were shorter than for the 0-second delay dura- 
tion condition. 
Gampel’s explanation of her results follows: 


The Ss reported both auditory and visual fluctuations 
between the whole word and its component parts (e.g., 
BUTTER changed to BUT and HER; CARPET to CAR and 
PET). These changes would thus account for the in- 
creased search times, since the effective stimulus for the 
identification tasks was at least momentarily absent. If 
the effect achieved here and termed satiation is in fact 
due to this stimulus disorganization, search times for 
the word components and their associates should de- 
crease with repetition as the search times for the entire 
word and its associates increase Cpp. 205-206]. 


Though it might be that Gampel's (1966) 
results were due in some way to verbal repeti- 
tion, it is not clear that they were due to seman- 
tic satiation. First, it is not clear why the 
visual fluctuations should account for increased 
search times for self-satiated items since it is 
conceivable that the self-satiated items could 
also be perceived in the reorganized way in the 
array, suggesting that the effective stimulus for 
identification may have still been present 
following satiation treatment. That is, in the 
array, CARPET could still have been identified 
as the reorganized CAR-PET without necessarily 
increasing search time. Second, it is possible 
that the longer search times following the 60- 
second repetition were due to a diffusion of 
attention effect resulting from the satiation 
treatment. The effect might not have been 
specific to the particular words the subject was 

searching for. To test for this, a control condi- 
tion with satiation treatment on a word differ- 
ent from the test word should have been in- 
cluded in Gampel’s study. The satiation hy- 
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pothesis would predict longer search times for 
the satiation condition than for the different- 
word control condition. In a series of pilot 
studies (Hubbard, 1964, Appendix I), Gampel 
did use a silence control group and a control 
group which counted from 1 to 10 as compari- 
sons for the repetition group. Though she found 
the repetition group inferior to these control 
groups, neither control group allowed for the 
evaluation of nonspecific, nonsemantic effects 
due to repetition. 

The results of a study by Baras, Braun, Teft, 
and Pettigrew (1969, Experiment D* which 
attempted to assess the effect of repetition on 
recognition thresholds were explained by 
Baras et al. (1969, Experiment II; see Foot- 
note 4) by the fact that in the repetition con- 
dition subjects saw the target words before 
their tachistoscopic presentation, unlike the 
no-repetition condition. Thus, both studies 
have explanations which do not depend on the 
semantic satiation hypothesis, 


CONCLUSION 


The several methods of measurement which 
have been used to assess the effects of repeti- 
tion on the meaning of words have yielded 
questionable results. Although the results of 


been generally 
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ratings, etc. "Therefore, one might conclude 
that what is needed now are studies using the 
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rate more precise controls. However because 
of the considerable number of studies that re- 
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sulted in negative findings, we are encouraged 
to look for a fresh approach to the phenomenon. 

Most later studies dealing with semantic 
satiation have concentrated on the semantic 
aspects of subjects’ reports from the Severance 
and Washburn (1907) study, and have com- 
pletely ignored subjects’ reports from that 
study which dealt with changes in the percept- 
ual qualities of the words, for instance, the re- 
ports of regrouping of the letters (see also 
Gampel, 1966). These later studies made no 
mention of the possibility—and were not de- 
signed to detect it—that a subject’s percept of 
the distal stimulus (the physical word) is con- 
stantly changing throughout his continued 
viewing and/or repetition of the word. If these 
perceptual changes are taking place, then the 
experienced “Joss of meaning” of the original 
word (and here we are granting, despite the 
small number of subjects used in the Severance 
and Washburn study, that there was some sub- 
jective experience which was labeled as loss of 
meaning) might very well be a secondary 
phenomenon dependent upon changes or re- 
organizations in the subjects’ perception of the 
distal stimulus. Successive reorganizations of 
the original meaningful-word percept might 
very well be meaningless to the subject. How- 
ever, in this case we should not speak of a word 
"losing" its meaning (as have most other 
experimenters who have unsuccessfully tried to 
measure such a loss). Rather, we should say 
that the original percept of the word is no 
longer being experienced, and the percept 
which replaces the original one is meaningless. 
For instance, “blood” might be perceived after 
a period of visual fixation as “b-loo-d,” and 
this percept might be reported as being foreign 
or meaningless, as was reported by Severance 
and Washburn. This report would not indicate 
that the percept “blood” has become meaning- 
less, but that “b-loo-d,” the new percept, has 
no meaning. Presumably, if the subject were 
to perceive “blood” again, it would still have 
meaning. 

Measures of meaningfulness have been used 
after the period of satiation treatment, and it 
is quite possible that at the time of measure- 
ment the original percept of a word has been 
reinstated. Perceptual reorganizations are ab- 
rupt, and relatively brief pauses or absences of 
the distal stimulus are enough to reinstate the 


original, most stable percept. This is true, for 
example, of the Necker cube (see, e.g., Orbach, 
Zucker, & Olson, 1966); if this is also true of 
satiation treatment, then one would expect 
that a loss of meaning would not be indicated 
by a measure of meaningfulness, a possibility 
which is not contradicted by the reviewed 
studies. 

Some scant evidence which supports the 
above conceptualization is the fact that for 
Severance and Washburn’s (1907) subjects, 
the reports of meaninglessness came only after 
reports of perceptual changes. For stronger 
support of the above interpretation, we turn 
to the phenomenon which has been labeled the 
verbal transformation effect. 

Warren (1968) has reported that when one 
listens to a recording of identical repetitions of 
a single word or phrase, “abrupt illusory 
changes are experienced, frequently involving 
considerable phonetic distortion [p. 261],” and 
has referred to this phenomenon as the verbal 
transformation effect. 

It is interesting that those experimenters 
studying the verbal transformation effect ask 
their subjects to report perceptual changes, 
while those studying semantic satiation gen- 
erally elicit responses from their subjects on 
measures of meaningfulness, and do not ask 
them for perceptual reports, even though both 
phenomena involve repetition. 

Warren (1968) claimed that the “loss of 
semantic organization” associated with seman- 
tic satiation is avoided in the verbal transfor- 
mation effect. Yet, if changes in the perceived 
word are taking place (e.g., “say” becoming 
“ace”) then it is obvious that the original 
semantic organization of the word is momen- 
tarily lost. What Warren might mean here is 
that the new perceived sound is also a meaning- 
ful one, that is, another word. That Warren 
should find this result to be so prevalent is not 
surprising since Warren (e.g., 1961) instructed 
his subjects to report words. When in contrast, 
subjects are instructed that “what you hear 
may be meaningful or meaningless, and the 
words may be English or nonsense," they often 
report perceiving nonsense forms as well as 
words (Taylor & Henning, 1963; this also 
supports the conjectured importance of in- 
structions in semantic satiation). Moreover, 
Warren and Gregory (1958) recognized that 
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reorganizations of a word also occurred when 
a subject repeated the word aloud for himself, 
These facts lead us to suspect that we have all 
been studying the same phenomenon in differ- 
ent ways. 

Warren (1968) explicitly stated: “It seems 
that the neuromuscular activity of producing 
speech sounds prevents verbal transformations 
by inhibiting ilusory phonetic changes [p. 
268].” But there is no reported evidence to 
Support his statement Since investigators of the 
effects of having subjects repeat words have 
attempted to index loss of meaning using mea- 
sures of meaningfulness. These investigators 
have not asked for perceptual reports, 

Warren (1968) further stated, “When one 
continues to stare at a printed word, changes 
analogous to verbal transformations do not 
occur [p. 268].” Yet Severance and Washburn 

(1907), whose study Warren cited in support 
of the above Statement, did report changes 
which they classified as regroupings of the 
letters of the word and changes in the percep- 
tion of various letters, 

Finally, Warren claimed that semantic Satia- 
tion involves a “progressive disorganization 
until we are left with a meaningless jumble of 
speech sounds [p. 268," which he referred to 
as perceptual decay, and implied that this is 
not the same as illusory phonetic changes. That 
“meaningless jumble of sounds” js, however, a 
new percept, or what Warren called a percept- 
ual reorganization, Though Warren claimed 
that perceptual reorganization and perceptual 
decay are two different Processes, we are 
claiming that Warren’s “perceptual decay” can 
be better conceptualized as another case of 
perceptual reorganization, That is, “perceptual 
decay” is the reorganization ofa meaningful 
word into a new percept which is meaningless 
to the subject. The original Percept does zot 
“decay,” that is, become increasingly less 
meaningful. Again it might be noted that 
Warren might have inhibited reports of seman- 
tically meaningless percepts by the type of 
instructions used, i 

We do not disclaim that there are certain 
types of changes, which occur when verbally 
repeating a word, that do not occur with ome 
longed visual inspection. or with continuec 
hearing of repetition of a word. We are a. 
suggesting that verbal transformations prob 
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ably do occur in all instances, and so any ex- 
perienced loss of meaning might be dependent 
upon them. 

We suggest that the subjective reports of 
“loss of meaning” are secondary, dependent 
upon perceptual changes and that future 
Studies of semantic satiation should be set up 
So as to be able to detect perceptual changes, 
as were the early Studies from Titchener’s 
laboratory as well as Warren’s (1968) studies, 

A further implication of the above con- 
Ceptualization is that reported changes of 
meaning are qualitative changes, as are the 
perceptual changes on which they depend, 
Measures of meaningfulness, which Presumably 
indicate quantitative changes, are not suitable 


more of its meaning, but a succession of per- 
cepts, some having meaning for the subject, 
and others not, This view contrasts with the 
studies reviewed in this paper, 


because the original percept of the distal stimu- 
lus becomes meaningless, Perceptual reorgani- 
zation is Precipitated. While this is a Possi- 
bility, it must again be pointed out that for 
the subjects in the Severance and Washburn 
(1907) study, reports of loss of meaning came 
only afler the reports of changes in the per- 
ceptual qualities of the words. But even if this 
alternative conceptualization were valid, the 
use of measures of meaningfulness following 
repetition would still not be plausible since one 
would not know whether a subject's percept 
had changed, 

The recommendation is that subjective re- 
ports should, if possible, be elicited during the 
satiation treatment by instructing subjects to 
describe their experiences, Instructions should 
be as neutral as possible so that subjects will 
not be set to respond in 
This should be done 
Sures used, 


any particular way. 
regardless of other mea- 
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BEHAVIORAL CONTRAST: 
REINFORCEMENT FREQUENCY OR RESPONSE SUPPRESSION? 


BETTY JO FREEMAN? 


Southern Illinois University 


A review of recent studies dealing with behavioral contrast effects revealed that 
there are currently two major interpretations of the phenomena. One involves the 
emotional consequences of response suppression, the other, changes in reinforcement 
frequency. It is impossible to choose between these two interpretations of contrast 
until a method is developed which adequately separates the effects of reinforcement 
frequency and rate of responding. Therefore, special emphasis in this paper is placed 
on studies which have attempted to separate the effects of these two variables. The 
methodological problems inherent in any attempt to separate reinforcement fre- 
quency and response rate are reviewed and discussed. 


A multiple schedule is one in which two or 
more schedules of reinforcement are alternated 
with a different exteroceptive stimulus asso- 
ciated with each. This provides a technique 
for bringing various behaviors within a single 
organism under stimulus control (Ferster & 
Skinner, 1957). However, the extent to which 
these schedules are useful as control techniques 
depends upon the relative independence of the 
various components (Herrnstein & Brady, 
1958). It is often assumed that the performance 
on one of the multiple schedules is essentially 
the same as the performance observed when 
the same schedule is programmed alone 
(Ferster & Skinner, 1957). Yet, there is a large 
body of evidence which suggests that this is 
not necessarily the case. It has been found 
that the frequency of responses in the presence 
of one stimulus often depends in part on the 
consequences of responding in the presence 
of a different stimulus (e.g., Pavlov, 1927; 
Solomon, 1943; Verplanck, 1942). In operant 
conditioning, where response rate is the de- 
pendent variable, the rate of responding during 
the presentation of one of the stimuli of a 
multiple schedule may be altered by chang- 
ing the schedule of reinforcement associated 
with the other stimulus (e.g, Brethower & 

1 This manuscript is based in part on a dissertation 
Submitted to Southern Illinois University in partial 
fulfillment of the requirements for the PhD degree. 
The author wishes to extend her gratitude to Donald 
Meltzer without whose guidance and criticisms this 
manuscript would not have been possible. 
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Reynolds, 1962; Reynolds, 1961b; Reynolds 
& Catania, 1961; Smith & Hoy, 1954). 
Reynolds (1961b) has called this phenomenon 
an interaction effect and has pointed out that 
these effects can be studied by changing one 
component of a multiple schedule while holding 
the other component constant. 

Interaction effects have typically been found 
when an operant discrimination is established 
(Herrick, Myers, & Korotkin, 1959; Schuster, 
1959; Smith & Hoy, 1954; Terrace, 1963b, 
1966a). Reynolds (1961a, 1961d, 1961e) re- 
ported that when the second component of a 
multiple schedule was changed to extinction, 
that is, a discrimination schedule, there was 
not only a reduction in rate of responding to 
zero in the extinction component (Ss) but also 
an increase in rate in the unchanged component 
(Si). The increased rate in S; occurred in spite 
of the fact that reinforcement frequency did 
not change in that component. 

Some other experiments have suggested that 
response rate changes in one component of a 
multiple schedule whenever the frequency of 
reinforcement changes in the second compo- 
nent rather than just when the second com- 
ponent is reduced to an extinction schedule. 
Dews (1958) found that on a multiple fixed- 
interval (FI) fixed ratio (FR) schedule the 
initial pause in the FI component was affected 
by the number of preceding ratio segments and 
was usually longer after several ratio segments. 
Findley (1958) showed that the rate of re- 
sponding on a variable-interval (VI) 6-minute 
schedule of reinforcement in one component 
of a multiple schedule increased when the 
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schedule in the other component had a mean 
interval greater than 6 minutes. Reynolds 
(1961b) has also suggested that whenever the 
frequency of reinforcement changes in one 
component, the change in the rate of respond- 
ing in the other component is in the opposite 
direction. . N 
Many phenomena similar to those found in 
an operant discrimination on a multiple 
Schedule have been described elsewhere (e.g., 
Amsel, 1958, 1962; Azrin, 1956, 1960; Helson, 
1964; Keller & Schoenfeld, 1950; Wertheim, 
1965). However, the present review is limited 
to a consideration and description of the inter- 
action effects which occur on a multiple 
Schedule of positive reinforcement and the 
procedures by which these effects are generated. 


DEFINITION OF CONTRAST 


Interactions among components of a multiple 
Schedule may be described in terms of the 
direction of the rate change (Reynolds, 1961b). 
In the typical contrast experiment, a base line 
of responding on a single VI schedule in both 
Sı and S, is first established. Then the schedule 
of reinforcement in Sə is altered, either in- 
creased or decreased, and changes in S, re- 
Sponse rate are recorded. When this procedure 
is followed, there are a number of possible 
results, Rate in the changed component may 
either increase or decrease. At the same time, 
rate in the unchanged component may increase, 
decrease, or remain unaffected, If the rate in 
the unchanged component increases and the 
rate in the changed component decreases, a 
positive contrast effect is said to occur (Skinner, 
1938). On the other hand, if the rate in the un- 
changed component decreases while the rate 
in the changed component increases, a negative 
contrast effect is said to occur. A rate increase 
in both the changed and the unchanged com- 
ponents is referred to as a positive induction 
effect, while a rate decrease in both components 
is referred to as a negative induction effect 
(Skinner, 1938). 

The study of behavioral contrast effects has 
become an active area of research because the 
classical interpretation of discrimination learn- 
ing cannot explain the data. The classical view, 
as first elaborated by Spence (1936) and later 
by Hull (1952), maintained that an analysis 
of successive discrimination learning should 
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require no new concepts beyond those of simple 
conditioning, extinction, and stimulus general- 
ization. The Hull-Spence theory assumed that 
the cumulative effects of reinforced responding 
would build up habit strength to a positive 
stimulus (S+) and that conditioned inhibition 
would accumulate to a negative stimulus (S—) 
as a result of no reinforcement. Habit and 
inhibition were assumed to generalize to similar 
stimuli, with the amount of generalization de- 
creasing with decreasing similarity. The net 
tendency to respond to any stimulus then is 
given by the generalized ‘habit minus gen- 
eralized inhibition to a particular stimulus 
(Hilgard & Bower, 1966). One important im- 
plication of this stimulus generalization. as- 
sumption is that responding in the presence of 
one stimulus should be positively related to 
responding in the presence of a second stimulus. 
In other words, when inhibition causes a de- 
crease in responding to S—, rate of responding 
should decrease in S+ as well. Clearly, this is 
not the case when contrast effects occur. 

As a result of this apparent contradiction of 
the prediction made by the classical interpreta- 
tion of discrimination learning, many investi- 
gators have attempted to isolate the variables 
which are responsible for producing contrast 
effects (e.g., Bloomfield, 1967a, 1967b ; Catania, 
1961; Nevin & Shettleworth, 1966; Reynolds, 
1961a, 1961c; Terrace, 1963a, 1966a, 1966b). 
A major problem in interpretation has arisen 
from the apparent confounding of rate of 
responding and the frequency of reinforcement 
in the changed component (S2) of the schedule. 
Typically, when the reinforcement frequency 
in the second component of a two-component 
multiple schedule is reduced, there is an in- 
crease in rate of responding in the other 
Component, that is, a positive contrast effect. 
However, there is also a decrease in the rate 
of responding in the changed component. The 
question then becomes which of these two 
variables is responsible for producing the con- 
trast effects. 


FREQUENCY oF REINFORCEMENT VERSUS 
RESPONSE SUPPRESSION 


Reynolds (1961a) has hypothesized that fre- 
quency of reinforcement is a more powerful 
variable than rate of responding in producing 
contrast effects, Further, it is the relative, 
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rather than the absolute rate of reinforcement 
which is responsible for contrast. Specifically, 
Reynolds (1961a) stated that the frequency of 
reinforcement in the presence of a given 
stimulus *relative to the frequency during all 
the stimuli that control an organism's behavior 
Lp. 70],” determines the rate of responding in 
the presence of that stimulus. Later studies 
(Catania, 1961; Nevin, 1968; Reynolds, 1961c, 
1963) have shown that the magnitude of the 
contrast effect, as measured by the change in 
rate of responding in the constant component, 
is inversely related to reinforcement frequency 
in the other component. 

Several more specific features of the con- 
trast effects described by Reynolds (1961a, 
1961b) have been identified. Shettleworth and 
Nevin (1965) found that contrast effects did 
not appear when the magnitude of reinforce- 
ment was varied in one component of a multiple 
schedule. Reynolds (1968a) found that resist- 
ance to extinction was increased in the presence 
of a stimulus correlated with contrast effects. 

Bloomfield (1967b) reported that contrast 
persisted through training and retraining of 
the same VI (S4-) and extinction (S—). He 
found that response rate in the VI component 
continued to be higher during multiple VI ex- 
tinction than during multiple VI VI even when 
the different schedules were presented several 
times. Bloomfield has labeled this a permanent 
contrast effect. What was unexpected in these 
results was that VI rate in the unchanged 
component during both multiple VI VI and 
multiple VI extinction continued to increase 
in successive presentations. Bloomfield also 
identified what he called transient contrast 
effects. Specifically, contrast appeared at the 
beginning of a VI period in spaced VI extinc- 
tion (presentations of the schedule were sepa- 
rated by 24 hours), but it declined within a 
1-hour session on VI. Similar types of contrast 
effects have been described by other investi- 
gators (Nevin & Shettleworth, 1966; Terrace, 
1963a, 1963b, 1966a, 1966b). 

Terrace (1966a) has shown that the perma- 
nent contrast effects as described by Bloom- 
field (1966, 1967a) disappear after prolonged 
exposure to the discriminative stimuli. This 
apparent difference may have resulted from 
the fact that different procedures were used 
in the two studies. Bloomfield employed a 
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traditional discrimination procedure in which 
the two components were both of a fixed 
duration and were alternated. Terrace, on the 
other hand, employed a correction procedure 
whereby each response to S» (extinction) de- 
layed the termination of the S» for 30 seconds. 
When a correction procedure was used (Ter- 
race, 1966a), response rate approached a peak 
and then declined to a greater than base-line 
value. Thus, contrast effects were said to dis- 
appear. When a correction procedure was not 
used (Bloomfield, 1966, 1967a) during dis- 
crimination training, the rate of responding 
to S, (VI) increased and then stabilized at some 
asymptotic level above base line. 

A second and perhaps more important pro- 
cedural difference involved the duration of 
the discrimination training period. Bloomfield 
(1966) continued training for only 10 to 17 
sessions, whereas Terrace (1966a) continued 
training for 60 sessions. The fact that ex- 
tended discrimination training resulted in the 
gradual disappearance of contrast effects led 
"Terrace to suggest that the emotional effects 
of response suppression play an important role 
in the origins of contrast. Earlier results 
(Terrace, 1963a, 1963b) had shown that con- 
trast effects did not occur when a discrimina- 
lion was learned without errors. Responding 
to Se (extinction) cannot be considered to have 
been suppressed in this case because responding 
to Sz never occurred. On the other hand, in a 
discrimination learning task in which errors 
did occur, or in a discrimination between 
stimuli correlated with independent schedules 
in which responding to one stimulus was 
punished, or in a discrimination between 
stimuli correlated with VI and differential re- 
inforcement of low rates of responding (DRL) 
schedules, behavioral contrast occurred. Thus, 
Terrace concluded that the suppression of re- 
sponding to one of two alternating stimuli, 
whether accomplished by nonreinforcement, 
punishment, or a reinforcement contingency 
which requires a low rate of responding, is a 
sufficient condition for contrast to occur. He 
further pointed out the similarity of the condi- 
tions which result in contrast effects and those 
which result in emotional behavior (Terrace, 
1966b) and hypothesized that behavioral con- 
trast is a by-product of frustration or similar 
With extended train- 


emotional responses. 
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ing the emotional responses adapted out, 
and as the results indicated, contrast effects 
disappeared. 

Terrace's explanation of contrast effects is 
similar to that offered by Amsel (1958, 1962) 
in an attempt to account for contrasted condi- 
tions of reinforcement in an alley. Specifically, 
Amsel has proposed that lowering the rein- 
forcement rate in one goal box (or in one 
component of a multiple schedule) produces a 
frustration response. The frustration response 
becomes conditioned to apparatus cues, and 
the motivational consequences of frustration 
cause a resulting increase in running speed 
(or rate of responding in Sı, the unchanged 
component). Whether or not contrast effects 
and the frustration effect result from manipula- 
tion of the same variables and are indeed the 
same phenomena is still open to empirical 
question, 

Reynolds and Limpo (1968a) found that 
contrast effects on a multiple VI extinction 

schedule did begin to disappear after 70 to 90 
sessions of exposure to the schedule. However, 
when the multiple VI extinction schedule was 
changed to a multiple VI VI, rate of respond- 

ing in the unchanged VI decreased to below 

base-line rate, a negative contrast effect. These 
investigators pointed out that attempts to 
explain contrast by appeal to an emotion 
aroused by nonreinforced responding cannot 
account for negative contrast after prolonged 
discrimination training when positive contrast 
has already disappeared. In order to do so, the 
theory would have to maintain that emotion 
is aroused by nonreinforcement and results 
in an increase in rate of responding during the 
unchanged component (Sı). Then, the emo- 
tional response is habituated, and contrast 
disappears with time. Finally, the effect js 
somehow reversed in sign by the recurrence of 
reinforcement producing the negative contrast 
effects. Reynolds and Limpo maintained that 
this type of explanation "stretches" the emo- 
tional response hypothesis. Further, they 
pointed out that a more parsimonious and de- 
scriptive explanation is that both 


positive and negative contrast are orderly phases in the 
dynamics of responding in the presence ofa stimulus 
under changes in conditions of responding and reinforce- 
ment in the presence of a different stimulus [p. 323]. 
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Beale and Winton (1968) have argued that 
the emotional response hypothesis can handle 
the Reynolds and Limpo data. They main- 
tained that a reduction of reinforcement fre- 
quency in S» (the changed component) pro- 
duces an excitatory effect on S, (the unchanged 
component). It follows that an increase in 
reinforcement frequency during Se will lead 
in turn to the development of inhibitory con- 
trol by S» on responding in the presence of Si 
Since the relative frequency of reinforcement 
in Sı will be reduced. 

Reynolds (1968b) responded that a decrease 
in response rate in the presence of a stimulus 
may not be evidence that the stimulus has 
inhibitory properties. An alternative explana- 
tion is that the decreased rate results from a 
decrease in the excitatory properties of the 
stimulus. The decrease in excitatory properties 
can be correlated with decreases in reinforce- 
ment frequency just as inhibition is said to be. 
Thus, Reynolds argued that either of these ex- 
planations is plausible and there is no need 
to assume that S» develops inhibitory control 
over behavior. 

One implication of the emotional response 
hypothesis is that there should be a. negative 
correlation between the magnitude of the re- 
duction in response rate in the changed com- 
ponent (Sz) and the magnitude of the contrast 
effect. However, Reynolds (1968c) has shown 
that the rate of responding to S, during the 
formation of a discrimination is not correlated 
with the rate of responding during the just 
previous presentation of Sp, Pigeons were ex- 
posed to 3 minutes of a red key alternated with 
3-minute periods of a green key. Each stimulus 
was presented 30 times per session. Both the 
red and the green keys were correlated with a 
VI 3-minute schedule of reinforcement, After 
responding had stabilized in the presence of 
each color, and rates of responding to the two 
colors were approximately equal, responding 
in the presence of the green key was extin- 
guished. Rate in the VI component increased 
while rate in the extinction component de- 
creased, a positive contrast effect. However, 
when rate of responding on the red key (VI) 
was plotted as a function of the rate of respond- 
ing on the preceding green key (extinction), no 
orderly relationship was found. Later, the 
Same procedure was employed except that re- 
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sponding in Sı was now reinforced on an FR 
150 schedule. The results were identical to 
those obtained with the VI schedule. While 
these data do not rule out an effect of respond- 
ing in extinction on behavioral contrast, they 
do make it difficult to account for the develop- 
ment and magnitude of contrast by appealing 
directly to the prevailing rate of responding in 
the changed component of a multiple schedule. 


SruDIES WHICH ATTEMPT TO SEPARATE 
REINFORCEMENT FREQUENCY AND 
RESPONSE SUPPRESSION 


In order to choose between the relative fre- 
quency of reinforcement and emotional re- 
sponse suppression explanations of contrast 
effects, it is necessary to develop an experi- 
mental procedure which adequately separates 
the effects of decreased responding and de- 
creased frequency of reinforcement. Several 
attempts to do this have been made. Reynolds 
(1961a) employed a schedule of “reinforcement 
for not responding” as the second component 
of a two-component multiple schedule, the 
first component being a VI 3-minute. In this 
schedule, food was available in the second 
component when the pigeon had not pecked 
the key for a specified time (/) (in this study 
50 and 75 seconds). The time without responses 
was measured from the start of the stimulus 
with which the schedule was correlated or 
from the last response in the presence of that 
stimulus. Each response initiated a new in- 
terval of not responding, and the interval was 
terminated by presenting a reinforcement after 
t seconds of no responding. Such a schedule 
involves the differential reinforcement of be- 
havior other than key pecking and is referred 
to as a DRO schedule. The multiple schedule 
employed was called a multiple VI 3-minute 
DRO 50-second. This procedure allowed a 
near zero rate of responding to be maintained, 
as would be the case if an extinction schedule 
were in effect, but at the same time provided 
a method whereby reinforcement frequency 
could be manipulated without changing the re- 
sponse rate. When reinforcement frequency in 
the DRO component was the same as that in 
the VI 3-minute component, no contrast effects 
were obtained. As the reinforcement frequency 
in the DRO component decreased, while the 
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first component remained constant, contrast 
effects were produced. 

Terrace (1966b) has criticized Reynolds’ 
procedure on the grounds that the stimulus 
correlated with DRO had previously been cor- 
related with extinction. It is very likely that 
few responses occurred when the DRO schedule 
was in effect. Thus, Reynolds’ failure to obtain 
contrast under the multiple VI DRO procedure 
could be attributed either to an equal rein- 
forcement density in both components or to 
the absence of an extinction curve, that is, 
response suppression, in the presence of DRO. 

Recently, Nevin (1968) has replicated and 
extended Reynolds’ (1961a) DRO data. In this 
study, pigeons were trained to respond equally 
on identical VI schedules in both components. 
The animals were then trained to respond 
differentially by changing one component (S2) 
to either extinction or DRO. Contrast occurred 
when extinction was correlated with S», but 
not when a DRO schedule with the same fre- 
quency of reinforcement as the VI was corre- 
lated with Ss. When the DRO reinforcement 
frequency was decreased, contrast did occur. 
Relative rate of responding in S, (the un- 
changed component) was found to be an in- 
creasing monotonic function of the relative 
frequency of reinforcement in Sı. 

Nevin's results must be considered in light 
of two methodological problems. First, the 
results are subject to the same criticism that 
Terrace made of Reynolds’ data. Since the 
subjects received exposure to four different 
multiple VI DRO schedules with multiple 
VI extinction sessions interspersed, the ab- 
sence of contrast in the multiple VI DRO 
schedule where reinforcement frequency was 
equal in both components, may have resulted 
from the absence of an extinction curve in 
DRO. Another problem results when animals 
are switched from a multiple VI VI schedule 
to a multiple VI DRO schedule. Immediately 
after the change from the VI to the DRO 
schedule, the animal is still responding as it 
would on VI. The bird must learn “not to 
respond” in order to obtain reinforcement. 
Essentially, this is a response suppression situa- 
tion. However, it should also be noted that 
until the animal acquires the DRO behavior, 
reinforcement frequency in S» will be lower 
than in S, even though the schedules in each 


component are identical. Thus, from either 
Terrace’s or Reynolds’ point of view, contrast 
should appear, at least initially, when a 
multiple VI VI is changed to a multiple VI 
DRO schedule even though reinforcement fre- 
quencies are theoretically equal. Since Nevin 
did not report the early DRO acquisition data, 
there is no way to determine if contrast or a 
decrease in reinforcement occurred. 

Dunham (1968) has pointed out that the 
DRO data can also be accounted for if the 
DRO contingency produced response patterns 
incompatible with rate increases in the con- 
stant component. Thus, resolution of the re- 
sponse rate-reinforcement frequency problem 
would be dependent upon the extent to which 
the DRO schedule produces behavior patterns 
incompatible with a rate increase in the alter- 
nate component. Thus, it seems that the DRO 
procedure does not adequately separate the 
effects of response rate and reinforcement 
frequency. 

Another control procedure, which provides 
a method of varying reinforcement frequency 
while holding response rate constant involves 
the use of concurrent Schedules. A concurrent 
Schedule is one in which either of two inde- 
pendent schedules programmed simultaneously 
determines the availability of reinforcement 
(Ferster & Skinner, 1957). Catania (1961) 
performed an experiment using three proce- 
dures in which pigeons could respond on either 
of two keys, one either green or yellow (the 
multiple key) and the other red (the non- 
multiple key). Procedure A involved establish- 
ing base-line performance and was à con- 
current VI 3-minute VI 3-minute Schedule. 
On the multiple key, the yellow and green 
stimulus lights alternated while a red light 
was always on the nonmultiple key. During 
Procedure B, multiple VI 3-minute extinction 
on the multiple key was concurrent with VI 
3-minute on the other key. The multiple key 
was either green (VI 3-minute) or yellow 
(extinction) while the second key was always 
red (VI 3-minute). When this latter procedure 
was in effect, there was an increase in VI rate 
on both keys with the greater increase occur- 
ring on the multiple key. 

Catania employed still another procedure 
(Procedure C) in which the multiple-key 
schedule remained multiple VI 3-minute ex- 
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tinction. Responses on the multiple-schedule 
key were never reinforced during the yellow 
extinction stimulus, but the multiple-key pro- 
grammer continued to operate, programming 
reinforcements for responses on the red key 
(the nonmultiple key). The same number of 
reinforcements were available as when the 
concurrent schedule consisted of multiple VI VI 
on one key (alternating green and yellow 
stimuli in Procedure A) and a simple VI 
schedule on the other (red) key. Under this 
revised procedure, however, the multiple VI 
3-minute extinction schedule on one key was 
concurrent with a mixed VI 3-minute VI 
15-minute schedule on the nonmultiple key. 
(A multiple schedule and a mixed schedule are 
procedurally the same except for the presence 
of exteroceptive stimuli correlated with each 
component on the multiple schedule. Different 
exteroceptive stimuli are not correlated with 
the components of a mixed schedule.) When 
this procedure was in effect, the total reinforce- 
ment frequency for responses on both keys did 
not change as the schedule on the multiple key 
switched between VI and extinction, However, 
as the schedule on the multiple key was alter- 
nated, several changes in response rate were 
observed. When VI 3-minute on the multiple 
key was concurrent with VI 3-minute on the 
red (nonmultiple) key, response rate showed no 
increase or decrease, When the extinction 
(yellow key) component was concurrent with 
VI L5 minute on the red key, there was a 
decrease to zero in the rate to the yellow 
stimulus and a corresponding increase in rate 
to the red stimulus. As the Schedule in the 
multiple key was Switched between VI and 
extinction, however, VI rate on that key re- 
mained unaffected, that is, contrast effects 
were eliminated. These results led Catania to 
conclude that a decreased rate of reinforcement, 
rather than a decrease in response rate during 
an extinction component of a multiple schedule, 
isa necessary condition for contrast. [t might 
be noted here that Catania failed to consider 
the fact that when Procedure C was in effect, 
the reinforcement frequency on the non- 
multiple key increased, and consequently, 
there was an increase in response rate on that 
key. This increase in response rate on the non- 
multiple key may have been as important as 
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the reinforcement frequency variable in elimi- 
nating the contrast effects. 

Bloomfield (1967a) employed still another 
method in an attempt to assess whether re- 
sponse rate or reinforcement rate is the major 
determinant of contrast. He utilized two 
multiple schedules: multiple VI FR and 


~ multiple VI DRL in which the VI component 


remained unchanged. In FR schedules, the 
frequency of reinforcement varies directly with 
the rate of responding; in DRL schedules, there 
is an inverse relationship between response 
rate and reinforcement frequency. Contrast 
eflects were shown by the change in VI base- 
line rate in a direction away from the change 
in reinforcement frequency in the other com- 
ponent. Bloomfield maintained that in the 
multiple schedules used, contrast was a regular 
function of the relative reinforcement rate in 
the component in which it appeared. As more 
of the reinforcement in a session occurred in 
one component, the percentage of total re- 
sponses emitted in that component increased. 
In the same study, Bloomfield (19672) also 
attempted to assess the effect of VI responding 
on the DRL and FR components by comparing 
the rate in these schedules in isolation to the 
rates on the multiple schedule. He found that 
response rates on DRL in isolation were lower 
than those on DRL on the multiple schedule. 
The effect of the VI schedule on DRL per- 
formance, then, may be described as induction. 
Reynolds (1963) also found that induction 
rather than contrast occurred in the constant 
component of a multiple schedule when the 
rate of reinforcement in the other component 
was slightly increased from a very low value. 
An analysis of the effect of the VI component 
on FR in a multiple VI FR schedule revealed a 
more complex relationship. Bloomfield (1967a) 
found that if VI reinforcement rate is much 
below that on FR, the FR response rate will 
tend to rise. However, as FR response rate 
increases so does FR reinforcement rate, with 
the result that the difference in reinforcement 
rates between the FR and VI components is 
Still larger. This should produce a continuing 
increase in FR responding. But, Bloomfield has 
pointed out that if the process is visualized as 
adding decreasing increments to FR response 
rate, a limit will be reached. Thus, a point is 
finally reached where performance stabilizes 
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and the interaction effects are no longer 
changing. Earlier, Reynolds (1963) found that 
the rate of responding maintained by about 
40 reinforcements per hour may not system- 
atically vary with the frequency of reinforce- 
ment associated with an alternated;stimulus. 
Specifically, he found that extinction (zero 
reinforcements per hour) during alternated 
presentations of a different stimulus seemed to 
neither increase nor decrease rate in the un- 
changed component if the rate of responding 
were maintained by 38 reinforcements per hour. 
The evidence seems to indicate, then, that 
contrast may not occur when rate of responding 
in one component is maintained by a very 
high or a very low frequency of reinforcement. 

To further complicate the picture, Reynolds 
and Limpo (1968b) reported that contrast 
appeared when reinforcement frequency in the 
changed component increased. These investiga- 
tors employed a multiple DRL DRL schedule. 
An exteroceptive stimulus was then correlated 
with reinforcement on the second DRL 
schedule. This caused a decrease in response 
rate and a corresponding increase in reinforce- 
ment frequency in this component where the 
stimulus was present. Theoretically, since on 
a DRL schedule response rate and reinforce- 
ment frequency are inversely related, it should 
have been possible to separate the effects of 
these two variables. However, since the in- 
crease in reinforcement frequency unexpectedly 
produced positive contrast effects, these two 
variables were even more confounded, The 
contrast effect in component one, as evidenced 
by the increase in rate in that component, 
also resulted in a decrease in reinforcement fre- 
quency. Consequently, the observed contrast 
effects could have been caused by the decrease 
in reinforcement frequency in the unchanged 
component, the increase in reinforcement fre- 
quency in the changed component, or the 
decrease in response rate in the changed 
component. 

Terrace (1968) employed a multiple VI DRL 
schedule in an attempt to separate frequency 
of reinforcement and rate of responding. 
Pigeons were first trained to peck a key on 
a VI 1-minute schedule of reinforcement. 
After responding had stabilized on the VI 
schedule, a multiple schedule was introduced, 
S, was correlated with the same VI schedule. 
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S» was correlated with a DRL schedule the 
value of which varied from day to day in an 
attempt to keep reinforcements in the two 
components equal. Three of the six animals 
run on this schedule showed contrast. The re- 
maining three animals showed very little 
change in VI rate. Terrace argued that this is 
evidence that response suppression is a suffi- 
cient condition to produce contrast. However, 
since S» was always correlated with a DRL 
schedule, response suppression in Se never 
occurred. Of course, the same type of acquisi- 
tion phenomenon described previously in rela- 
tion to Nevin’s study (1968) might also have 
occurred. Unfortunately, the data regarding 
the number of reinforcements actually ob- 
tained during DRL acquisition is not pre- 
sented. Also, since the DRL schedule was 
reportedly changing from day to day, the 
change in the schedule may have influenced 
the results. 

In the same study, Terrace also ran animals 
on a multiple VI 1-minute VI 1-minute 
Schedule. After stability had been obtained, 
one of the VIs (Sə) was correlated with 
punishment. Each response in Sə produced a 
mild electric shock. Thus, theoretically at least, 
response rate was suppressed while maintaining 
equal reinforcement frequencies in both com. 
ponents. Contrast was produced in this situa- 
tion. It should be noted that the introduction 
of a punishing stimulus in S; decreased rein- 
forcement frequency in S; as well as in So. 
Terrace pointed out that reinforcement fre- 
quency remained approximately equal in S, 
and S, Thus, the punishing stimulus in S, 
supposedly reduced reinforcement frequency 
by the same amount in both components. In 
addition to the changes in reinforcement fre- 
quency, the introduction of a punishing 
stimulus in S» was also correlated with changes 
in rate of responding. Rate in Se decreased 
while rate in S; increased the positive contrast 
effect. Thus, contrast effects were generated 
in a situation where four variables were chang- 
ing, that is, reinforcement frequency in both 
S; and Sz as well as response rate in Sı and So, 
as a consequence of a single manipulation, the 
introduction of a punishing stimulus in S». It is 

impossible to say which of these changes is 
responsible for producing the contrast. 
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SUMMARY 


Two questions have to be answered before 
a conclusion can be reached on what are the 
necessary and sufficient conditions for the 
occurrence of behavioral contrast effects. 
First, under what conditions have contrast 
effects been demonstrated? Contrast occurs 
when the relative frequency of reinforcement 
is altered and when responding is suppressed 
in one component of a multiple schedule. While 
both of these conditions are sufficient to pro- 
duce contrast, it has not been clearly demon- 
strated that either of these two variables alone 
provides the necessary conditions. Before this 
question can be answered, a technique must 
be developed which allows the independent 
manipulation of response rate and reinforce- 
ment frequency. As is evident from this review, 
no such technique is yet available. 

Second, under what conditions have contrast 
effects not been demonstrated? Terrace (1963a) 
has reported that contrast does not appear 
when a discrimination is learned without errors. 
In addition, contrast effects will diminish or 
even disappear after prolonged discrimination 
training (Terrace, 1966a). These two lines of 
evidence have led Terrace to conclude that the 
emotional consequences of nonresponding are 
the primary antecedent conditions for contrast. 
With prolonged training, the emotion dimin- 
ishes and Consequently the contrast effects 
disappear. What is missing in these experiments 
is a description of what happens to base-line 
rate of responding after prolonged exposure to 
the schedule. The fact that contrast disappears 
may result from an overall 
with extended exposure to the schedule. 
Another control condition that is currently 
missing in the literature is what happens to 
errorless discrimination with 
posure. Suppose that rate in 
learned. without errors 
extended exposure 
explanation 


depression of rate 


prolonged ex- 
à discrimination 
also decreases with 
to the schedule. Then an 
dependent upon the emotional 
consequences of nonresponding or a decrease 
in response rate would not be able to explain 
the data. 

Dunham (1968) in a review of within-subject 
contrast has pointed out that the data are 
equivocal with respect to the necessity of 
changes in reinforcement frequency for positive 
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contrast. He concluded that 


the essential problem for developing a model to account 
for positive contrast effects in the within-subjects pro- 
cedure involves the development of a conceptual linkage 
between the necessary antecedent of response suppres- 
sion and the concept of emotionality as discussed by 
Terrace [p. 3117. 


This conclusion does not seem warranted by 
the data. It has not been shown that response 
suppression alone will produce contrast effects. 
In addition, there are many ambiguities in- 
herent in any explanatory concept that deals 
with emotion. For example, what is the 
emotion? What are the conditions'that produce 
the emotion? Until emotion can be experi- 
mentally defined, then it adds nothing to the 
explanation of the contrast phenomena. 

It is the present reviewer's opinion that no 
adequate explanation has been offered for the 
phenomena of interaction effects on multiple 
schedules of positive reinforcement. Of the two 
discussed in this review, Reynolds’ (1961a) 
description of the data in terms of changes in 
the relative frequency of reinforcement does 
at least describe the data. An appeal to the 
emotional consequences of nonresponding does 
not add anything to this description. However, 
the mechanism that can account for all types 
of contrast effects is probably very complex 
and the result of changes in a number of 
variables. Limiting the study of contrast to 
the examination of only changes in response 
rate and reinforcement frequency in the altered 
component of the multiple schedule has re- 
sulted in the ignoring of several other variables 
which may be important. One such variable 
is the change in the overall frequency of rein- 
forcement, When the schedule of reinforcement 
in one component of a multiple schedule is 
decreased, there is a reduction in the number 
of reinforcements available during the session. 
It may be this reduction in overall reinforce- 
ment frequency rather than the reduction in 
reinforcement or response rate in the changed 
component which produces the positive con- 
trast effect. 

Another variable which changes as a conse- 
quence of changes in reinforcement frequency 
is the number of responses per reinforcement. 
For example, if the schedule of reinforcement 
is decreased in one component of the multiple 
Schedule, the number of responses per rein- 


355 


forcement automatically increases even if no 
other changes in response rate occur. As with 
the overall decrease in reinforcement fre- 
quency, this variable may be of tremendous 
importance. 

The method used to change the schedule of 
reinforcement in one component may also con- 
tribute to the occurrence or nonoccurrence of 
the contrast phenomena. An abrupt reduction 
in reinforcement frequency might produce 
contrast whereas if the schedule were gradually 
changed no contrast would occur. Careful at- 
tention should be given to this problem in 
attempting to specify the necessary antecedent 
conditions for contrast effects. 

The question of what are the determinants 
of interaction effects is complex. Neither the 
relative frequency. of reinforcement nor the 
the response suppression hypothesis ade- 
quately deals with all of the contributing 
variables. In short, before an explanatory 
mechanism for interaction effects on multiple 
schedules of positive reinforcement can be 
postulated, the effects of changes in variables 
other than reinforcement frequency and re- 
sponse rate in the altered component have to 
be empirically determined. 
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LATITUDE OF REJECTION: 
AN ARTIFACT OF OWN POSITION! 


O. W. MARKLEY ? 


Educational Policy Research Center, Stanford Research Institute, Menlo Park, California 


It is shown that the latitud! 


e of rejection of attitude statements as measured by the 


method of ordered alternatives is artifactually contaminated by the extremity of 
own position. A more direct measure of the threshold of rejection was tried. Differ- 
ential “ego” or attitudinal involvement across varying own positions cannot be 


validly inferred by use 
done) or by the newly 
latter ideally requires the property 


of either the latitude of rejection ( 
introduced measure of the threshold of rejection since the 
of equal intervals, which is not achieved in most 


as has been previously 


applications. It is therefore recommended that the use of the latitude measures for 


comparison of differential in 
of equal extremity of own position. 


Method of Ordered Alternatives 


An attitude scaling method which has gene- 
rated a sizable amount of research is the 
method of ordered alternatives (Sherif & 
Sherif, 1967, p. 116; Sherif, Sherif, & Nebergall, 
1965, Chapter 2). In this method, a number 
(typically nine) of similarly worded attitudinal 
statements ranging from extremely “pro,” 
through neutral, to extremely “anti? an issue 
are presented in rank-ordered sequence. T' he 
complete set of nine statements is presented 
on each of four sheets assembled into a booklet. 
On the first sheet the subject is asked simply 
to indicate the statement most acceptable to 
him; on the second, to indicate any other 
statement or statements which are also accept- 
able or not objectionable; on the third to 
select the statement most objectionable; and 
on the fourth to indicate any other statement 
or statements also objectionable. 

The method of ordered alternatives yields 
four measures useful in attitude scaling: 
(a) “Own” position: the statement found most 
acceptable to the respondent; (b) latitude of 
acceptance: the number of statements desig- 
nated acceptable or not objectionable; (c) 
latitude of rejection: the number of statements 


1 The author gratefully acknowledges the support of 
a predoctoral fellowship from the Danforth Foundation 
of St. Louis, Missouri, and partial support by National 
Science Foundation Grant GS1309X at Northwestern 
University, Evanston, Illinois. 
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volvement between respondents be limited to responses 


designated objectionable; (d) latitude of non- 
commitment: the number of statements neither 
designated acceptable nor objectionable (Sherif 
et al., 1965, p. 30). 

The development of the method of ordered 
alternatives was based on results of studies 
of social judgment which indicated (a) 
that "strong commitment to a position in- 
volves a lowered threshold of rejection; that 
is, the subject tends to see positions different 
from his own as objectionable and to reject 
them categorically [Sherif et al., p. 30] (cf. 
Hovland & Sherif, 1952; Sherif & Hovland, 
1961); and (b) that the “degree of personal 
involvement with a stand varies with ex- 
tremity of the stand [Sherif et al., 1965, 
p. 261." 

Although involvement and extremity of an 
attitude are not identical constructs, they are 
typically correlated and are difficult to sepa- 
rate operationally (Ward, 1966). Thus, finding 
that the results of a number of studies sup- 
ported the above hypotheses, the size of the 
latitude of rejection was adopted as an opera- 
tional index of involvement (Sherif & Sherif, 
1967, p. 120). 


Artifactual Contamination of Lalitude Scorings 


If the size of the latitude of rejection is to 
be an operational index of involvement, and 
if involvement and extremity are highly cor- 
related, it is important that the latitude of 
rejection be free of any artifact associated 
with extremity, per se. That is, any association 
of the latitude of rejection with extremity due 


7 
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TABLE 1 


DIDACTIC RESPONSES TO THE METHOD OF ORDERED ALTERNATIVES ASSUMING EQUAL ITEM 


INTERVALS FOR ACCEPTANCE, NONCOMMITMENT, AND REJECTION WITH DIFFERENT 
EXTREMITIES OF OWN POSITION 


Scale item Measure 
Response set 

A B c D E F G | H | I LA | LNC | LR | TR 

1 vv M = — x xX x X XX 2 2 S 4 

2 <= Y vv y = od X X XX 3 3 3 4 

3 x | f= | v hve | g (=d = E: g 4 2 4 
Note,— Vv = most acceptable ite he subject's own position; 4/ = also acceptable item; — = (blank) noncommital re- 
garding; = most rejected item; X ; latitude of acceptance; LNC = latitude of 


noncommitment; LR titude of reje n; TR = 1 
tween the subject's own position and the first rejected item, 


solely to properties of the scaling device would 
threaten the validity of any inferred involve- 
ment from the size of the latitude of rejection.? 

One way to test the method of ordered 
alternatives for such contamination is to 
construct didactic sets of responses with 
differing degrees of extremeness of "own posi- 

tion," but with equal item intervals for ac- 
ceptance, noncommitment, and rejection. 
Table 1 shows three such sets of responses. 
If extremity had no effect on the latitude of 
rejection, it should remain constant under the 
assumption of equal item intervals, 

As Table 1 shows, the latitude of rejection 
is artifactually associated with the extremity 
of own position. Those subjects who endorse 
the extreme Item A as their own position find 
no items of still greater extremity. Hence, 
that portion of their latitudes of acceptance 
and noncommitment which is based on items 
to the left of their own position is curtailed 
by ceiling effects. At the same time, the 
latitude of rejection based on items on the 
right is spuriously lengthened due to the 
increased distance to the most rejected item. 
Hence latitude differences between differing 
extremities of own position under the assump- 
tion of equal item intervals in Table 1 are 
artifactually produced. 

The finding of this artifact makes suspect 
results of previous studies in which conclusions 
were based on the existence of a valid relation- 
ship between involvement and the latitude 
scores, irrespective of differences in extremity. 


3 The author wishes to thank Donald T. Campbell 
for warning of this threat. 


A novel measure of the threshold of rejection, being the number of items be- 


For example, Powell (1966) found that dogma- 
tism was positively correlated with extremity 
of own position (r = +.46), and with latitude 
of rejection ( r = +.33). Although there is no 
reason to question the positive correlation 
between dogmatism and extremity, the pres- 
ent analysis suggests that the correlation 
between dogmatism and the latitude of rejec- 
tion may be due to the extremity artifact per se, 
and not to any differences in involvement 
which are associated with differences in 
dogmatism. 


More Direct Measure 


of the Threshold of 
Rejection 


For whatever worth it may have as a 
“negative result,” described below is an unsuc- 
cessful attempt to devise a scoring method for 
the method of ordered alternatives which 
would yield an uncontaminated measure of 
the threshold of rejection. A number of scoring 
methods were tried. ‘The most promising 
method was to note the number of items from 
the subject’s own position to the first rejected 
item. The psychometric advantage of this 
measure as compared to the latitude of rejec- 
tion can be seen by the fact that it is invariant 
under the assumption of equal item intervals in 
Table 1. 

The new measure was used on data previ- 
ously published by Sherif et al. (1965, Figures 
24 and 2.3, pp. 32-52) regarding presidential 
candidates in the 1960 elections and on data 
produced by subjects who simulated having 
varying involvement in an attitude, but with 
a fixed extremity (Markley, 1967). Analysis 


LATITUDE OF REJECTION 


of these data indicated that the new measure 
could not validly be used to compare varying 
involvement across different own positions 
unless the scale had the property of equal 
intervals. This is true because a threshold 
measure is essentially an interval measure. 
The method of ordered alternatives as typi- 
cally constructed, however, neither has nor 
was intended to have this property (Sherif 
et al., 1965, p. 25). While method-of-ordered- 
alternative-type scales could perhaps be con- 
structed which would approach the property 
of equal intervals, this is not achieved in most 
applications and can never be perfectly vali- 
dated (Green, 1954). 


Conclusion 


Although it is not valid to compare either 
thresholds or latitudes of rejection which are 
associated with different extremities, their com- 
parison within a given own position is appar- 
ently permissible. Where comparisons between 
attitudes differing in extremity is desired, the 
“Own Categories Procedure" (Sherif & Sherif, 
1967 ; Sherif et al., 1965) may provide a viable 
alternative. 
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ON BROGDEN'S INTERPRETATION OF FACTORS 


CHESTER W. HARRIS: 


University of California, Santa Barbara 


Brogden’s interpretation of factors 


Provides weights which should be applied 


to the (generally unavailable) common parts of the data and not to the data 
themselves. If the weights are applied to the data themselves, the result is one 
of several possible estimates of common factor scores. These (Brogden’s) 


weights are not, in 
proportionality can 


general, proportional to the factor pattern; however, this 
be achieved if one develops the final 
means of the Harris-Kaiser independent cluster algorithm 


oblique solution by 
(Harris & Kaiser, 


1964). Such a solution guarantees proportionality of two sets of weights: Those 
to be used to estimate the variables from the factors, and those that might be 


used, following Brogden, 


Two points should be added to the recent 
discussion by Brogden (1969) of the problem 
of interpreting factor results. 

Brogden points out that it is quite proper 
to interpret the factor pattern as a set of 
weights that could be used to estimate a 
variable from the factors or factor scores, but 
that this interpretation—which normally 
should be an interpretation of the rows of the 
factor pattern—is not relevant to the estima- 
tion of the factors, given the variables. In- 
stead, Brogden suggests that there is another 
matrix of weights which would be applied to 
the common parts of the data to construct 
the factor scores, and he offers illustrations 
of his point. 

For factor analysis the factor Scores are 
not, in general, uniquely computable, and 
although one has access to the data, the com- 
mon parts of the data are not available. Con- 
sequently, the weights matrix (A' AJA’ 
which Brogden describes and illustrates must 
be clearly understood to be the transformation 
of the unavailable common parts of the data 
into the unavailable factor scores. This matrix 
of weights might be applied to the actual data, 
rather than to the unavailable common parts 
of the data, and one would then have what 
Harman (1967, p. 373) describes as the 
“ideal variable” method of estimating factor 
scores. 

The first point is that the question of how 
to estimate factor scores is central to Brog- 
den’s discussion, though he does not explicitly 
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to estimate factor scores from the 


variables, 


state this. Since there are many methods of 
estimating factor scores, and these different 
methods are likely to give different sets of 
weights for estimating the factors from the 
variables, the interpreter must make some 
choice. Both Harris (1967) and McDonald 
and Burr (1967) have shown similarities and 
differences among several such methods; fur- 
ther, additional methods are being developed.? 
Brogden’s choice of the ideal variable method 
should not be assumed to be the only choice. 
The first point, then, is that although the 
factor pattern gives a unique set of weights 
that describe the variables as linear functions 
of the factors (factor Scores), there are sev- 
eral, usually different sets of weights that 
estimate the factors in terms of the actual 
variables. One must choose, and Brogden's 
choice is not necessarily the choice of others. 

The second point is that although Brogden's 
weights are not generally proportional to the 
factor pattern, there is à case in which they 
Will be. This is the case in which A'4 is a 
diagonal matrix: this can be achieved by the 
well known “independent cluster? solution 
Which uses the Harris-Kaiser (1964, p. 357) 
method of developing an oblique solution by 
employing only orthonormal and diagonal 
transformation matrices, Consequently, if the 
interpreter wishes to use Brogden’s procedure, 
one path he could follow would be to use the 
Harris-Kaiser independent cluster solution 
algorithm, which is extremely easy to program 
and to execute, as the method of developing 
his oblique solution; then the resulting factor 

? Unpublished papers by F 


Edu Roger Pennell and Suke- 
yori Shiba extend the numbe: 


r of methods 


| 
| 
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pattern would have the same relative weights 
as would the matrix developed by the ideal 
variable estimation procedure. It is certainly 
true that the  Harris-Kaiser independent 
cluster solution is a “bad” one for some sets of 
data; in fact, an excellent illustration of a 
bad ‘solution is included in the original ar- 
ticle. This fact simply means that for some 
sets of data corresponding (proportional) sets 
of weights for estimating variables from fac- 
tors and sets of weights for estimating fac- 
tors from variables do not have the charac- 
teristics ordinarily associated with the con- 
cept of simple structure. 
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FURTHER COMM 


ENTS ON THE 


INTERPRETATION OF FACTORS 


HUBERT E. BROGDEN : 


Purdue University 


In commenting on an earlier paper, Pro 


fessor Harris has suggested that the 


matrix of weights on the latent test vectors which reproduce the factors is one 


earlier paper, a distinction is made bi 
intepretation, and it is suggested tha 
tion may not be appropriate for in 
fessor Harris do not seem to be rel 
pretation. 


terpr 


In stating that (A^ A)24' isa transforma- 
tion from the unavailable common parts 
(my £) of the observed variables into the 
factors, Harris (1971) is in essential agree- 
ment with my Equation 3 and the associated 
definitions and discussion. He then states, pos- 
Sibly prompted by the use of (A AJA to 
interpret the factors knowing the observed 
variables (my Tables 3 and 4), that (A^ A)-14' 
Corresponds to a matrix of weights that, in 
the "ideal variable" method, is used to esti- 
mate factor scores from the observed vari- 
ables. He goes on to point out that many 
methods of factor score estimation are avail- 
able, each with an associated weight matrix, 
and that one must choose amongst these. 
This statement involves misunderstanding of 
my intent, and clarification is needed. Re- 
garding a second point made by Harris, I do 
not disagree and do not feel that additional 
comment is needed. 

My original intent would have been clear, 
possibly, if I had labeled the rows of Tables 
3 and 4 as the common parts of the variables 
rather than as the variables themselves and 
thus avoided any suggestion that (A^ A)-14' 
might prove useful in interpreting the factors 
knowing the observed variables (Brogden, 
1969). Such treatment of the topic would 
have been adequate for my purpose since ex- 
amples of (A' A)2A' were exhibited as a 
means of showing that this matrix of weights 
did lead to proper interpretation of the factors 
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etween factor score estimation and factor 
t weight matrices appropriate for estima- 


etation. Thus, the comments of Pro- 


evant to the earlier paper on factor inter- 


and thus lending emphasis, by the contrast 
between (A' A)-!A' and A, to my contention 
that A. (the weights on the common factors in 
the basic model) were not appropriate to the 
interpretation of the factors from the ob- 
served variables. 

If one takes the point of view, as Harris 
Seems to (1971), that any weighted sum 
of the observed variables that is poten- 
tially relevant to the problem of factor 
interpretation is also relevant to the problem 
of factor score estimation, and vice versa, it 
would seem to follow that one does need to 
choose between the available methods of 
estimation in approaching the problem of 
factor interpretation, However, the problem 
of interpretation is distinct from the problem 
of estimation. Thus a factor interpretation 
based on the regression weights (XT, in the 
notation of my original article) that one 
might use to obtain 6 or I'X3X would ob- 
viously lead to an interpretation of the pre- 
dicted factor scores (8) as distinct from an 
interpretation of the factors themselves, Since 
the structure of 6 may be materially different 
from that of 0, as Professor Harris is well 
aware (Harris, 1967), this distinction is not 
an idle one. On the other hand, (A^ A)7!A", 
when used to interpret the factors given the 
observed variables, does not seem to suffer 
from this deficiency of the regression weights: 
From Equations 2 and 3 of my original paper; 
it is clear that 


WAWY = (A! Aug! (6 + 9 
=04 Aye [1] 


; 


he M 
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and it would be reasonable, if « were random 
error, to hold that @ can be interpreted, when 
the interpretation of X is known, since the 
vector, (A' A)7A' e, has elements that are 
weighted sums of random error components, 
and these elements should not systematically 
alter our interpretation of an element of 6. 
Normally, of course, e contains reliable spe- 
cific variance as well. While the presence of 
specifics clearly introduces additional prob- 
lems, these problems are inherent to factor 
interpretation, however it may be accom- 


#plished. Note that the foregoing has no clear 


relevance to the adequacy of (A’ A)“ in 
estimating factor scores. 

The above discussion could be extended by 
considering additional weight matrices (Har- 
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ris, 1967) that have been used to estimate 
the factor scores. The examples given are suf- 
ficient to illustrate the distinction between 
objectives of estimation and interpretation. 
While this distinction in objectives is pertinent 
to the possible practical use of (A’ A)7A' in 
factor interpretation, it is not essential to the 
major thesis of my original paper. 
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COMMENT ON OVERALL AND SPIEGEL'S “LEAST SQUARES 
ANALYSIS OF EXPERIMENTAL DATA” 


GEORGE W. JOE: 


Institute of Behavioral Research, Texas Christian Universit y 


Comments are made concerning the model employed by Overall and Spiegel in 
their design matrix in the use of a multiple regression technique to the problem of 


analysis of variance. 


With the publication of Overall and Spiegel’s 
(1969) article on the use of a regression ap- 
proach to analysis of variance problems, it is 
likely that many researchers will follow their 
methods of analysis. The remarks which follow 
should be taken as a call for a more thorough 
understanding of the model used by Overall 
and Spiegel. 

While this writer is not a “theoretical statis- 
tician,” there are some points concerning 
Overall and Spiegel’s basic model which are 
not brought out by them, but which are im- 
portant to any researcher using it. 

As pointed out by Overall and Spiegel, the 
usual model for a two-way analysis of variance 
with p levels of the a factor and q levels of the 
B factor is written 


Xr = u + ai + B; + Bij + eii 
where the parameters are defined in 
usual sense (see Overall & Spiegel). 

In terms of the general linear model, 
may write: 


[1] 


their 
one 


X= Ag. [23 
where X is a column vector containing the ob- 
served scores, A is the design matrix for the 
experiment, ¢ is the column vector of parame- 
ters for the model, and e is the error vector. X 
is of order (V X 1), A of order (N X z), $ of 
order (z X 1), and e of order (N X 1). 

The method of analysis of Overall and 
Spiegel has the following model, although it is 
not given by them: 

X = ATy + € [3] 
where X, A, and e are defined as before. The 
Matrix AT is the matrix that Overall and 
Spiegel call their "design matrix" used in the 
multiple regression. y is the column vector of 
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regression parameters, X is of order (N X 1), 
A of order (N X z), y of order (t X 1), and € 
of order (V X 1). 

T is a matrix of order (z X t) and is a 
matrix such that it forms dummy variables 
from the original design matrix, A, of the 
nature: (o)— a), ..., (a1 — o), (oua 
= a), D" (a I aj), (81 = Ba), nmm (Bai 
— Ba), (Barı — Ba), ..., (Ba — Ba), and all 
products of (am —az)(@n — Bu) for n= 1, 

+» (and m = 1, ..., p; however, n z£ d and 
m = l. The product of a,@,, which represents 
interaction, is designated by ôs later. 

The product of T’T is a particular non- 
singular matrix. Ata“ 

As one sees later, the regression coefficients 
are the solutions one would get if one used the 
nonestimable conditions in solving the regular 
analyses of variance normal equations in the 
case of equal cells. 

From Equations 2 and 3, it is seen that 


$= Ty, [4] 


and the “normal equations" based upon 
Equation 3 turn out to be 


T'A'X = T'A'ATy 
7 = (T'A'AT)C»T'"A'Yy. 


[5] 

[6] 

The variance-covariance matrix for C4 is 
var(C¥) = e*C(T'A'AT)coc — [7] 


under the assumption that var(X) = oI and 
Cy is estimable. 


The sums of Squares error equals 

SSe = X'X — X'AT(T'A'AT)ciyprA'y, [8] 
In ™ ds the conditional inverse of 
(T A'AT). The conditional inverse is neede 
if (T’A’AT) is singular. Singularity, however, 
is likely to be rare; therefore, T'A'AT will be 
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invertible in most cases. Hence, in Equations 
6, 7, and 8, the regular inverse of (T'A'AT) 
should be used where the conditional inverse 
is indicated. Likewise, if T'A'AT is invertible, 
C in Equation 7 is the identity matrix. 

Since illustrations often prove to be valuable 
insight aids, consider the following two-way 
design with two levels of A and three levels of 
B with unequal cells. 


ju 7 0 7 O 
js j 0 0 j 
is j 00 0 
E = | oF 
jn 0 j j O0 
jun O 7 O fj 
ja 9 7 «0 0 
E(X) = A¢ 


where j indicates a column vector of ones. 
For our model, T is a matrix such that it 
forms the following dummy variables from our 
A: (a1 — a2), (B1 — Bs), (Bs — Bs), (ar — ox) 
(81 — Bs), and (ar — ax)(8s — 8s). In our 


illustration T is the following matrix: 


1 0 0 0 0 0 
0 1 0 0 0 0 
0 =f 0 0 0 0 
0 0 1 0 0 0 
0 0 0 1 0 0 


0 0 -1 -1 0 s [10] 


Peu o 6 GC 1 
0 gg o @ QO 1 
0 0 0 0-1-41 
0 0 0 0-1 0 
0 0 0 0 O0 -1 
0 0 O9 O6 i 1 
and 


[11] 


If one were to compute AT, one would get 
the “design matrix” of Overall and Spiegel. 
Since the relationship of the method of Overall 
and Spiegel to the general linear model is of 


y' = (Yo Yu Ys Yay Ys ys). 


B 
1 2 3 
1 | Xu | Xp Xs 
A il 
2 Xn X» X» 


|u \ 
J o1 
jJ 0 0 7 O Ojja 
000 j 0 0| |p: 
00000 j OF [Bs 
3 0 0 0 0 O 7 [Bs C 
ia [9] 

512 
515 
521 
02» 
595 


os 


interest, we now consider the relationship of 
y to $. 

Since ¢ = Ty in our model specified by 
Equation 3, Ty from our example equals 


y D em yay = Yir Ys "Y3, — Y2 — "Yay Ya Vör 
=F = 5a. qu; eye) C323] 


Therefore, comparing corresponding elements 
of ¢ and Ty, we have 


H = Yo, 01 = yi, 02 — — yy b1 = Y2, b? = Ya, 
Bs =— ys — ys Ou = va, 
i = ys, 013 =— Y4 — Ys, 
Ó»i Ya, 02 = — ys, 023 = Ya Ys. 


As one can see, 


Ya; = 0, £ 8; = 0, and X 8;; = 0. 
i=l uv ij 


Therefore, from Equation 6 we have fı cor- 
responding to a1, js to By, etc. 

'The matrix T'A'AT is a square symmetric 
matrix. For our illustration T'A'AT is the 
following matrix: 
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TA'AT = 


where T’A’AT is a symmetric square matrix. 
For our illustration, T'A'X equals 


IA Y n Aug 
TANE = X. — X. [14] 


Xu — Xn — Xa+ Xo 
Xi — Xz- Xa + Xz 


where G is the grand total, and 


Ke = D. Xi, 
j=l 
u* a* 
ge N 0' 
a® 0 | solid 
TA'AT = -— ——— — 
B* 0 | zeroes 
T 0 | zeroes 


Therefore, for equal cells, the following re- 

lationships are obvious; 
R? (a*, B) = R (a) + R*(859), 

and 
R*(aj*, Bj*, 6:;*) = R*(aj*) + R*(85*) + R*(5,5*). 

Although our illustration is for a two-way 
analysis of variance with two levels of one 
factor and three levels of the other factor, 
the generalization to an .V-way analysis of 
variance with multilevels in each factor is 
direct. 

As one can see, the model used by Overall 
and Spiegel is the general linear model. There- 
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25 Xa. 
i=l 

As pointed out by Overall and Spiegel, one 
must determine particular R’s to determine 
the sums of squares due to a particular source. 
T is a matrix such that if one postulates a 
model without interaction, one need only strike 
out the rows and columns of T'A'/AT asso- 
ciated with the interaction parameters. 

In the special case of equal cells, one can see 
from Equation 13 that T'A'/AT is à super- 
diagonal matrix. It will be so in general: 


I* 
0’ ) 


| zeroes 


| zeroe 


zeroes | solid 


fore, the assumptions and restrictions asso- 
ciated with the theory of analysis of variance 
must be observed by researchers who choose to 
use the method of Overall and Spiegel. There- 
fore, a more detailed and theoretical presenta- 
tion of the Overall and Spiegel method which 
includes tests of hypotheses, etc., needs to be 
made if researchers who use it are to be able to 
interpret their results adequately, 
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NOTE ON CHOOSING BETWEEN COMPETING 
INTERPRETATIONS OF CROSS-LAGGED 
PANEL CORRELATIONS 


ROLF GUNNAR SANDELL 1 


Institutet for Konsumtionsjorskning 


The possibilities of drawing causal inferences from cross-lagged panel correla- 
tions are considered. Assuming that the effects of a causal state vanish with 
time, and assuming three-wave panel data, certain patterns of cross-lagged 
correlations are open to causal interpretations. 


Campbell's cross-lagged correlation tech- 
nique (Campbell & Stanley, 1963) has re- 
cently been discovered to offer competing in- 
terpretations (Rozelle & Campbell, 1969; 
Yee & Gage, 1968a, 1968b). Specifically, the 
technique in itself does not discriminate be- 
tween congruent effects of one of the two 
variables involved and incongruent effects of 
the other. As Rozelle and Campbell (1969) 
note, *This greater equivocality must be ac- 
cepted as a definite reduction over previous 
estimates of the utility of the method [p. 
76); 

Yee and Gage (1968a, 1968b) proposed a 
method to identify congruent and incongruent 
changes, which is essentially an extension of 
Campbell's method of analysis. The idea is 
to identify congruent changes with increases 
and incongruent changes with decreases in 
synchronous correlations. The causal direc- 
tion is then inferred as in Campbell’s method. 
However, it would seem that this could not 
be unconditionally true, depending as it must 
be on converging and diverging changes in 
the variables themselves. Moreover, as Dun- 
can (1969) recently has pointed out, causal 
inferences will always be underdetermined by 
2W2V data. 

However, by a simple and reasonable as- 
sumption and a modification of the opera- 
tional procedures, a possible solution for 
identifying congruent and incongruent causal 
relationships might be found. 

The assumption is that the effects of a 
causal state or event decrease (vanish, dis- 
sipate) over time. In fact, this is a common- 


! Requests. for reprints should be sent to Rolf 
Gunnar Sandell, Institutet fór Konsumtionsforskning, 
Kungsgatan 53, 111 22 Stockholm, Sweden. 


place observation in any realm of investiga- 
tion. It should be pointed out, however, that 
the phenomenon is observable only insofar as 
the causal state itself changes over time. 

The modification in procedure consists sim- 
ply of taking measurements on both variables 
three times instead of twice, as suggested by 
Campbell and Stanley. 

Consider then observations in variables A 
and B on a panel sample on three occasions, 
1, 2, and 3. Cross-correlations between Occa- 
sions 1 and 2 would indicate the possible pair 
of interpretations, as in the original method 
of Campbell and Stanley. That is, if 7,152 > 
Fus, then A causes B congruously or B 
causes A incongruously, more than the other 
way around. 

Then, observe cross-correlations between 
Occasions 1 and 3. If A causes B congruously, 
and if, as assumed, the effect vanishes over 
time, then 7:3 < "a. At the same time, 
'y145 Would be equal to 751,», if the causal re- 
lationship is unidirectional. 

On the other hand, if B causes 4 incongru- 
ously and if it is still assumed that the effect 
vanishes over time, then řas > »ia». If the 
causal relationship is unidirectional 
would at the same time be equal to 7,15». 

The method makes it further possible to 
identify joint effects of different kinds, that is, 
different bidirectional causal relationships. 

Thus, still assuming 74152 > 751,9 and 7,153 
< Taivo, iios < “b1a2 Would indicate a bidirec- 
tional congruent causal relationship, whereas 
fuas > fuss Would indicate that B causes A 
incongruously while 4 causes B congruously. 
Conversely, assuming 7515» > 7149 and Foras > 
yim, a1v3 > Taine Would indicate a bidirec- 
tional incongruent causal relationship. All 
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TABLE 1 


SUMMARY OF INTERPRETATIONS OF THREE-WAVE 
PANEL CnRoss-LacGED CORRELATIONS, 
ASSUMING raiz > Tola? 


(rata — rath2) 


(rh1as — rh1a2) Fa 1 = 
<0 =0 | 20 
mixed | incongruous incongurous 
>0 bidirectional unidirectional | bidirectional 
t= Bt | Bt = A} => By 
Bt => A37 | Be- A; 
congruous equivocal equivocal 
=0 unidirectional 
At — Bt | 
| | 
congruous | equivocal | equivocal 
<0 bidirectional 
t—Bt | | 
Bt = A: | 


other sets of relations between (7,155 —Ta12) 


and (143 — 751,23) are equivocal, assuming as 
before 7,152 > 7,149. 


For a summary, consider Table 1. 
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ANALYSIS OF THE COMBINATION OF 
PERCEPTUAL DIMENSIONS! 


H. KAUFMAN? 


University of Connecticut 


AND 


ROBERT M. LEVY 


Indiana State University 


Processing (identification) of stimuli in sets varying along many perceptual dimen- 
sions is analyzed into components related to the processing of each separate dimen- 
sion, The components, in multidimensional processing, of response bias, interaction, 
and correlation are defined and reduced to informational terms. Finally, it is shown 
how the total multidimensional information transmission can be partitioned into 
components corresponding to these concepts. 


It is commonplace that the human can 
identify quickly and accurately a vast number 
of stimuli in many sense modalities. The re- 
markable thing about this capability is that it 
is achieved despite the fundamental limitation 
on human identification capacity pointed out 
by Miller (1956) in his article “Magical Num- 
ber Seven.” The limitation, since verified many 
times, is that the number of stimuli differing 
along a single perceptual dimension that can 
be identified without error is about seven, or 
from five to nine. In slightly different terms, 
the channel capacity for unidimensionally 
varying stimuli is somewhat under 3 bits. It 
follows evidently that the efficiency in the sense 
of high channel capacity of the human infor- 
mation transmission system is limited by the 
extent to which stimuli vary along many di- 
mensions simultaneously. Not so obvious are 
the rules of dimensional combination : What are 
the characteristics of component dimensions in 
multidimensional stimulus sets which make for 


1 Acknowledgment is due the Office of Naval Research, 
which supported this research, in part, through a prime 
contract NOnr 2512 (00) with General Dynamics/ 
Electric Boat as a part of the Submarine Integrated 
Control (SUBIC) program. Work on this article was 
undertaken while the second author was at General 
Dynamics/Electric Boat, Groton, Connecticut. A por- 
tion of this paper was presented at the meeting of the 
Eastern Psychological Association, Philadelphia, April 
1969, 

2 Requests for reprints should be addressed to H. 
Kaufman, Department of Psychology, University of 
Connecticut, Storrs, Connecticut, 06268. 


distinctive elements? This question, funda- 
mental in perception, has been given hardly 
any experimental consideration. One reason is 
that until recently very little theoretical basis 
for experimentation has been developed. 

Eriksen and Hake (1955), Lockhead (1966), 
Corcoran (1966), and most recently Garner 
and Morton (1969) have raised and discussed 
the basic issues with which we are here con- 
cerned. 

Eriksen and Hake provided the basic 
paradigm for isolating and manipulating the 
effects of dimensional combination (compound 
stimuli) and raised the issue of ‘independence 
of judgment [1955, p. 113]." Lockhead (1966) 
and Corcoran (1966) both referred to the ‘“in- 
dependence of stimulus dimensions" and, of 
course, Garner and Morton (1969), in an 
article published while the present manuscript 
was in draft, provided a convincing analysis 
of the concept of “perceptual independence.” 

In our procedure the concept of perceptual 
independence is an important part of a general 
informational analysis. One of the features of 
our model is that it represents, in a formal set- 
ting entirely consistent with that presented by 
Garner and Morton, a more general dimen- 
sional analysis in which perceptual independ- 
ence is a special case. In general, our purpose is 
to present a procedure for analyzing identi- 
fication responses as a function of the com- 
bination of dimensions along which stimuli 


vary. 
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For the purpose of analysis it is sufficient to 
assume that a number of perceptual dimensions 
have been isolated and that any combination 
of values on all dimensions is possible. We 
assume that it is possible to fix all but a given 
dimension, A, to vary the stimulus along 4, 
generating the values 4:4»... Aya and 
similarly for Bı . . . Byg, C1. . . Cyc. With 
bidimensional variation, the values 4,21, 4, 
+++ AyaBygor AiCi . . . AyaCyc, etc., make 
up the stimulus set. In a typical experiment the 
set of stimuli is specified, each stimulus being 
labeled unambiguously, then the stimuli are 
presented for identification a large number of 
times with the order of presentation random- 
ized. 

As an example, suppose that the combination 
of hue and brightness is under investigation. 
Three series of stimuli are prepared: 4 1, Ao, 

two color patches of the same brightness, but 
different hues; By, Bs, two patches of the same 
hue, but different brightness; and 412, A 1B», 
A2By, AsB», four patches varying in both hue 
and brightness. 

At the outset we recognize that some major 
problems must be met in attempting to collect 
meaningful data according to the procedure 
illustrated in the example. With the jnd sepa- 
ration between  unidimensionally varying 
stimuli fairly large (say five or more), it will re- 
quire up to eight or more stimulus points along 
the dimension to obtain less than perfect per- 
formance with normal adult subjects by the 
usual procedure. With 10 values on each of 4 
and B, the complete or orthogonal bidimen- 
sional set (the set of pairs, all values on the A 
dimension with all values on the B dimension) 
contains 100 stimuli. This presents a formid- 
able experimental problem. On the other hand, 
to obtain less than perfect performance with 
two values on a dimension, as in the example, 
factors other than the information load per 
stimulus must be introduced. The problem, 
however, is not of immediate concern in the 

purely formal analysis which we are presenting 
here. For this we assume only that the three 
sets of stimulus-response (S-R) matrices 
{A :A}, (B: B), and {AB:AB}, with the sup- 
erposed bar indicating a response, are available 
(the S-R matrix has as its entries p(S, R), the 
probability of identifying the Sth stimulus 
with the Rth response), and that the entries in 


a. The A dimension b. The B dimension 


di Ae By B 
"E el 7 i 
A; | .40 20 | B, 40 26 
As | 10 | 30 | B | 10 | 2 


T(AtA) p= .124510 T(B: Bip 061433 


c. The AB bidimension 


ABı ABs A.B, AsBy 
1B, | 160 | .104 | .oso | .052 
AB. | 040 | .096 .020 | .048 


| 
EP | .040 | .026 | .120 | .078 


010 | .024 | .030 | .072 


— T(AB:AB) — 188043 


Fic. 1. Hypothetical results of identification experi- 
ments: stimulus-response matrices. 
"^ 


à 
the unidimensional matrices are not trivially 
0 or 1, that is, “less than perfect performance.” 
A set of hypothetical results from such a pro- 
cedure is given in Figure 1 showing the ma- 
trices for an A and B one-dimensional series 
with two alternatives in each, and the S-R 
matrix for the two-dimensional A B series with 
four alternatives. 

Another factor of great psychological interest 
which our analysis can afford to ignore is the 
labeling of multidimensional stimuli. With 
even as few as five points on each of three di- 
mensions combined orthogonally, there are 125 
stimuli to be recognized. To use a response 
scheme calling for the identification on each of 
the dimensions is to assume that the dimensions 
are perceptually distinct. This surely is not 
justified in many conditions and, in any case, 
should be left to empirical test. For our analy- 
sis, it is sufficient to assume that there exists a 
one-one mapping of responses on stimuli such 
that each response serves to "identify" un- 
ambiguously one and only one stimulus of the 
set. 

Our aim in what follows is to define some im- 
portant experimental effects in identification 
Processes with one- and two-dimensional 
stimuli, and to develop an analytic context in 
which these effects can be separated and mea- 
sured. It becomes evident that this analyte 
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COMBINATION OF PERCEPTUAL DIMENSIONS 


Context is provided by information theory 
measures. 


INTERACTION 
The first question we ask of multidimen- 
sional stimulus identification is how the di- 
mensional components interact. The term 
interaction is used here to imply a comparison 
of the identifiability of a given stimulus com- 
ponent, for instance, 41 when presented alone 


=~ and when presented in various bidimensional 


= 


3 m By presented alone we mean, as a 


ninimum requirement, that the value of the 
Second dimension is constant across all values 
of unidimensional variation. Some of our 
readers have taken exceptions to our choice of 
the term "interaction" for a major class of 
dimensional combination effects. Apparently 
the major reason for the uneasiness is that the 
two "interactions" have some very different 
properties. A narrow interpretation of the sta- 
tistical concept of interaction might lead to 
some confusion; however, we believe that the 
statistical and perceptual characteristics of 
"interaction" are indeed similar. If this is the 
case, then not only is the use of the familiar 
statistical term not appropriate, but it should 
facilitate interpretation of the "interaction" 
when used to describe perceptual processes. 

There are a number of different kinds of 
possible interactions, two of which seem especi- 
ally important. We shall call them general di- 
mensional interaction and specific component 
interaction. 


General Dimensional Interaction 

The most general interaction effect of com- 
bining two dimensions in a set of stimuli is 
that the identifiability of a component dimen- 
sion may be different in the bidimensional con- 
text than alone. The data in Figures 1 and 2 
provide an example. From the two-dimensional 
matrix, both an (4:4) and a {B:B} matrix 
can be derived by summing over appropriate 
cells, For example, the (41:44) matrix in Figure 
2d is obtained from the matrix of Figure 2c by 
Summing all cell entries with a given A; and 
A; component value; thus the entry for the 41 
column A, row cell in Figure 2d is the sum of 
the four cell entries in Figure 2c corresponding 
to an A, stimulus component and an 4; re- 
Sponse component, that is, p(41B1, A,B), 
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a. The A dimension b. The B dimension 


A ae Bis Bi 
a | 40 | .20 Bi 
zt | 8 || 30 Be 


T(A?tA) p= .124510 T(B: Bip 397313 


c. The AB bidimension: Showing an A 
general dimensional interaction 


A,B, AB: A:Bı 
| | 
225 | .050 | .045 | .010 
.025 | .200 | .005 | .040 
.000 | .000 | .180 | .040 
.000 | .000 | .020 | .160 


T(AB:AB) -— 1.007209 


d. The 4 dimension derived from the two- 
dimensional matrix in Figure 2c 


A, As 
| 30 | 40 
00 | AO 


T(A:A)2p= 609986 


Fic. 2. Hypothetical stimulus-response matrices in 
the identification of unidimensionally and bidimen- 
sionally varying stimulus sets. 


p(AiBi, B3), plis, A.B), and pi 
AB»). A similar derived matrix for the B com- 
ponent is not shown in Figure 2, since it is 
identical to the matrix of Figure 2b obtained on 
the one-dimensional series. The latter condi- 
tion, demonstrated with the {8:5} matrix, 
defines the condition of no general interaction. 
On the other hand, the fact that the derived 
(A:À) matrix is different from the unidimen- 
sional (4 : A] matrix defines the condition of a 
general 4 dimensional interaction. 
Information measure for general dimensional 
interaction, Each of the matrices in Figure 2 is 
a set of stimulus-response probabilities. It is 
natural therefore to calculate for each the infor- 
mation transmitted from stimulus to response.? 
These can be represented as T (48:45), the 


2 
2; 


3 The measure of the amount of information trans- 
mitted from one variable to another T(X:Y) can be 


a. Submatrix for Bı 


^ 
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b. Submatrix for Ba 


A\B, A»B, AiBs AB. 

= | E gen i 
A, | -250 | .050 | Ai | -250 | .050 
| "s |.000 | .200 

T i, (A:À) —.609986 T g,(A:À) = .609986 


Fic. 3. {4:A} Submatrices derived from the data 
of Figure 2c. (The probabilities do not sum to one in the 
derived submatrices, therefore, calculations of T' are 
based on the entries in the submatrices normalized to 
Zp-1) 


information transmitted in the two-dimensional 
series, Figure 2c; T(4 :A)1p and T(B:B)p, the 
one-dimensional information transmission, Fig- 
ures 2a and 2b; and T(A:A)op and T(B:B)p, 
the one-dimensional information transmission 
from the derived (4:4) and {B:B} matrices. 
The condition for a general dimensional in- 
teraction is then simply that T(4:4);p or 
T(B:B)p for the matrix of one-dimensional 
series and for the matrix derived from the two- 
dimensional series be different. Note that this 
interaction can be either incremental, where 
the stimulus components are more accurately 
identified, as in the example of Figure 2 (com- 
pare Figures 2a and 2d), or decremental, in 
which case T(A :A)op is less than T(A :A),5. 


Specific Components Interaction 


A second kind of interaction is that con- 
cerned with the (inferred) identifiability, in the 
two-dimensional set, of a given stimulus com- 
ponent, for example, A», from one value of B 
to another. 


calculated in different ways (see Atneave, 1959; 
Garner, 1962); for example: 


T(X:Y) = H(X) + H(Y) — W(X, Y) 
where 


N 
H(X) =— E pi logs pi; 
sar 


M 
H(¥) 2— E pj log: pj; 
jel 


NM 
H(X,Y) 2— EZ È pi loge pi. 
isl j=1 
For the multivariate condition, the information trans- 
mitted from one set of variables to another is given by: 


T(XF:WZ) = H(XY) + H(WZ) — W(XY, WZ). 


In a four-alternative set formed by combin- 
ing two values on A with two values on B, 
there are two {4:A} submatrices derivable 
from the bidimensional series matrix, one each 
for B, and B». These are given, for the data of 
Figure 2c, in Figures 3a and 3b. In those data 
the two {4:4} submatrices are identical. 
Similarly the two {B:B} submatrices in the _ 
same data have also been made to be the same. 
The data of Figure 2 therefore show an A gen- 
eral dimensional interaction, no B general 
interaction, and neither an A nor a B specific 
components interaction. This means that while 
in general the 4 components were more dis- 
tinctive (readily identified) when in the bidi- 
mensional context than when alone, there were 
no differences in A component identifiability 
from one value of B to another. An example of 


a. Hypothetical stimulus-response matrix for the AB set 


ABı AB? A.B, A:Bo 
120 | .130 | .120 | .026 | 396 
y 
.030 | .120 | .030 | .024 | .204 
.080 | .000 | .080 | .104 | p 
| 
-020 | .000 | .020 | .096 | .136 | 
= l 
T(AB:AB)=.369426 
b. Submatrix for B,, c. Submatrix for Ba, 
derived from derived from 


ligure 4a Figure 4a 


AB, AS ABa ABa 
E ccm 
Ay .15 AS 
As | 40 | 10 


Tn(A:A)=0 T p,(A tA) = 609986 


d. Submatrix for 4 b e. Submatrix for 4s, 
derived from 


Figure 4a 


AB, 


derived from 
Figure 4a 


AB A.B, AyBs 
By -20 | A3 | 7 E 13 
B; | .05 E | = | os | a2 
T4,(B: B) = 064433 


lic. 4. Data on à bidimensional series showing an 
Specific components interaction. 


T a,(3: B) =.064433 | 
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àn 4 component interaction is given in Figure 
4. The values of A are more readily identified 
when presented with B. (Figure 4c) than with 
B, (Figure 4b). 
t Information measure for specific components 
interaction. The transmitted information func- 
tion of the {4:4} submatrices, such as those 
in Figure 3, can be designated T'i(4 :4), 
Tp(A :A). The weighted mean of these, 
T,(A:A), reflects the effects of the B values 
on the overall identifiability of the 4 com- 
ponents. The T(A:À)sp is a measure of the 
identifiability of the A components ignoring 
the effects of B. The comparison Tp(A:A) 
— T(4:A)2p measures the specific components 
interaction effects. The interaction effect is not 
directly measured by the differences among the 
Ty; (A:A) measures, unless the row totals of 
the several submatrices are the same. If the 
rows are free to vary, then it is possible to have 
each (4: A) p; submatrix yield the same T mea- 
sure and still have a large value for Tp(A:A) 
— T(A:A)op- 

The informational analysis for the data in 
Figure 4 gives the following results, beginning 
with the B specific components interaction: 


Ta(B:B) = Tax(B:B) = Ta(B:B) 
= T(B:B)sp = .397313 


There is no B components interaction. Each 
derived {B:B} submatrix has the same trans- 
mitted information (adjusting the probabilities 
in each matrix to sum to unity), which is the 
same as the transmitted information calculated 
from the {B:B}2p matrix. For the (A:A) sub- 
matrices, on the other hand: 


Tim (A:A) = .000000 
1) = .609986 
/ .304993 
T(A:À)sp = .124510 as before. 


The difference Ts(4:À) — T(4 :A)ap or 
.180413 is a measure of the amount of the 4 
specific component interaction. 

As is discovered later, since the case where 
both an A and B components interaction occur 
has some special features, an example of this 
is provided in the data of Figure 5. 


CORRELATION 


, The other major concept in dimensional com- 
ination is that of correlation. By analogy to 
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a. Hypothetical stimulus-response matrix for an 4B set 


A,B, ABa A.B, ABa 


138 | 100 | .102 | .032 | .372 
.012 | .150 | .048 
000 | .068 | .128 | .288 
.000 | .032 | .072 | .112 


.092 
.008 


T(AB:AB) = 416951 


c. Submatrix for Ba, 
derived from 
Figure 5a 


b. Submatrix for Bı, 
derived from 
Figure 5a 


A,B, AB, A,B, ABs 
Ay. | 48 | 5 A 
" 10 | .10 As 


Tp,(A:A)=0 T (A 1A) =.609986 


e. Submatrix for 4s, 
derived from 
Figure 5a 


A,B A:Bı ABs 


d. Submatrix for A1, 
derived from 
Figure 5a 


AB, 


Ta, (GG: B) = .238254 T 4,(B: B) = 001286 


f. Response-response matrix, derived from Figure 5a 


T (À: B) =.007813 


Fic. 5. Data on a bidimensional series showing both 4 
and B specific components interaction. 


the standard concept, correlation can be in- 
terpreted as the tendency to associate perceptu- 
ally particular values on one dimension with 
particular values on the other. It is related to 
the concept of perceptual independence Garner 
and Morton (1969) discussed. We justify this 
usage both on the grounds of general conceptual 
similarity to the statistical term (the degree to 
which a quantitative attribute, x, tends to 
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a. The one-dimensional b. The one-dimensional 
> shape series color series 
cross star red gold 
1 men m | | 
cross | .70 .20 | red 45 | 30 | 
star | 30 | .80 | gold | 25 | .70 
j | 
3 ——————— 


c. Expected response-response matrix for the A,B; 
(red cross) stimulus on the assumption of 
independent dimensions 


A,B, 
red gold 
| C70) 673) | C70) (25) 
cross | 525 175 | 70 
— | (.30)(.75) | (.30)(.25) 
star .225 .075 | 30 


Ny 
D 
io 
p 


Tan (A: B) 0 


d. Response matrix for 
A,B, showing largest 
positive correlation 
consistent with the 
fixed marginal 


e. Response matrix for 
A,B, showing largest 
negative correlation 
consistent with the 

fixed marginal 


totals totals 
AB, A,B, 
red gold red — gold 
e —— 
Cross | 70 | .00 | -70 cross | .45 25 | 10 
star | 05 | 325 | 30 star} .30 | .00 | 30 
378 25 45 25 


Ta m (Ai B) =.016272 Tan (A: B) 2.153078 


Fi. 6. Hypothetical stimulus-response and response- 
response matrices showing correlation effects. (Since 
here we are analyzing individual response-response 
matrices, it is more convenient to look at conditional 
response probabilities.) 


covary with another attribute, y) and on the 
specific characteristic that the uncertainty 
terms measuring the effect are essentially 
monotonic with the linear correlation (squared) 
defined, with some simplifying assumptions, 
on the same x, y matrix, including the import- 
ant feature that when the informational *'cor- 
relation" term is zero, the statistical correla- 
tion is also zero. 
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Consider as an example, the two one-di- 
mensional series of 4, = cross, A» = star, 
Bı = red, By = gold, with the S-R matrices in 
Figures 6a and 6b. If the two dimensions are 
perceptually “independent,” then the expected 
responses to the 41,2; stimulus (the red cross) 
from the AB set can be calculated according 
to reasoning illustrated in Figure 6c. The prob- 
ability of perceiving the red cross as the red- 
Cross is the probability of perceiving the color ' 
as red times the probability of perceiving the 
Shape as cross; the probability of perceiving 
the red cross as the gold star is the product of 
the probabilities of perceiving red as gold and 
Cross as star, etc. If the response-response ma- 
trix in Figure 6c is interpreted as a bivariate 
distribution in color and shape, the correlation 
in that distribution is zero, that is, the vari- 
ables color and shape are statistically inde- 
pendent. Reasoning analogously, we can inter- 
pret the response matrix in Figure 6d as one 
reflecting a positive correlation, and the one 
in Figure 6e as reflecting a negative correlation 
between color and shape. Note that in each case 
the dimensional correlation is calculated from 
the responses to a single stimulus in the series, 
hence there are as many estimates of the cor- 
relation as there are bidimensional stimuli. 
Note also that the different correlations in 
Figures 6c, 6d, and 6e are all generated with the 
same marginal totals for the Separate dimen- 
sions. 

Informational measure for correlation, Each 
of the stimuli in the bidimensional set yields a 
Ta;n,(ÀA :B) measure on the response-response 
matrix such as that of Figures 6c, 6d, and 6e. It 
is obvious by inspection that when the correla- 
tion is zero (as in Figure 6c) T'4,5,(A: B) = 0- 
Therefore, the average of these measures; 
designated 74 (A :B), is directly a measure of 
the overall correlation effects, 


PERCEPTUAL RES PONSE BIAS 


Another effect in the analysis is that related 
to the response-response matrix {A:B} i 
Figure 7, which js derived from the row (re- 
sponse) totals of the {AB TB} matrix. It 1$ 
called response. bias because it reflects the 
degree to which particular -1 responses and 
responses tend to occur together. The difference 


b. Response bias effect, 
derived from Figure 5a 


a. No response bias effect, 
derived from Figure 4a 


Ay As A Em 


S| 


T(A:B)=0 


T (A: B) =.007813 


Fic. 7. Response-response matrices showing 
r response bias effect. 

in interpretation between correlation and re- 
sponse bias is crucial. If, for example, the A 
series is brightness and the B series size, in the 
bidimensional series a correlation term would 
arise if the brightness was overestimated (with 
respect to the prediction based on independ- 
ence) with a larger stimulus and underesti- 
mated with a smaller one; response bias im- 
plies, on the other hand, that the brightness is 
overestimated because of the “large size” re- 
sponse to the stimulus (regardless of the actual 
size of the stimulus). In other words, correla- 
tion is related to stimulus characteristics, 
whereas response bias is related to response 
characteristics. Logically, the two concepts can 
be separated easily. In practice, however, it is 
difficult to say, if a large-bright tendency 
should be observed, whether the brightness re- 
sponse goes directly with the response to the 
stimulus or to the stimulus itself (since the re- 
sponse, “large,” tends to be associated with the 
stimulus, “large”). 

Informational measure for response bias. The 
(A:B) matrix generates the informational 
quantity T(A:B). As the symbol implies, 
T (À: B) is related to the correlation; it is the 
transmitted information in the matrix formed 
by summing the {4:5} matrices, while 
Tan(A:B) is the (weighted) sum of the trans- 
mitted information in the four matrices. 


INTERACTIVE BIAS 


The last effect in the analysis, related to the 
derived (B:À4) and (4:B] matrices, is both 
interaction and response bias; response bias 
because it is concerned with the probability of 
responses with respect to a given attribute, 
Such as B,, Bs without consideration of the 
stimulus attributes Bi and Bz directly corre- 


Sponding; interaction because it reflects the 
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degree to which these response probabilities 
differ over values of the stimulus attribute on 


the other dimension, 4. 
The corresponding informational measures 


are T (4: B) and T(B:À). 
Tue INFORMATION EQUATION 
The fundamental relation among the quan- 
tities defined above is given in Equation 1: 
T(AB:AB) S Ty(A:A)-- Ta (B: B)--T(B: À) 
+7(4:B)+Tan(A:B)—T(A:B). OJ 
Defining the general interaction as: 
ola = T(A:À)sp — T(A:A)io 
cly = T(B: Bop — T(B:Byp, 
and defining the specific components interac- 


tion as: - 7 
sla = Tp(A:A) = T(A:A)op 


T4(B:B) — T(B:B)2, 
Equation 1 can be rewritten as 
T(AB:AB)e T(A:A)ip-- TG: Byola 
rali sla sInsc T(B:À )+T(4 B) 
+Tan(A:B) — T(À:B) [4] 


where 574 and glg can be positive or negative, 
all other terms being intrinsically positive. 


[2] 


[3] 


sIn 


The Relation among the Informational 
Components 


We will illustrate how the components are 
related in a number of cases, using the data of 
Figures 1-6. 


Co: No Bidimensional Effects 


Tn the simples case, illustrated in the data of 
Figure 1, there are no bidimensional effects of 
any kind. The equation then has the elegant 
form: 


T(AB:AB) = T(A:A)1v + T(B:B)i. 


Cı: Only a General Interaction in A or B or Both 


As Equations 1-4 make clear, the general 
interaction, involving a comparison of the 
(A :A}ip and (B: Bho with corresponding the 
summary (.1:4]25 and (B: B)sp matrices, is 
independent of all other eflects. Hence we can 


ignore it in covering the remaining cases. In 
what follows all symbols refer to measures on 
the bidimensional matrix. 


C»: Specific Components Interaction in A or B 
or Both 


Only an A or B components interaction, but 
not both. An A interaction effect alone can be 
introduced into the data of Figure 1 by con- 
structing the two submatrices {4:A} m1 and 
{A:A}p2 and changing the cell entries subject 
to the two conditions: (a) The sum of the two 
submatrices must be (4:4); otherwise, the 
general interaction would be affected; (b) the 
marginal totals of the two submatrices be main- 
tained. The column totals are fixed by the con- 
ditions of the experiment. The row totals could 
be different from one submatrix to another, in 
which case there would be introduced a T(B:A) 
effect (the row totals are the probabilities of 
Ay and A» for the given value of B. If they are 
different for different B» stimulus values, there 
must be a T(B:A) effect). In Figure 4, an A 
components interaction has been constructed, 
subject to the two conditions, from the data of 
Figure 1. The information formula can be 
written 


T(AB:AB) = Ty(4:A) + T(B:B). [5a] 


A similar procedure with the B interaction only 
would give an analogous result: 


T(AB:AB) + T(A:À) + T. (B: B). [5b] 


Both an A and a B components interaction. 
Again subject to the two conditions stated 
above, an 4 and B interaction can be intro- 
duced simultaneously. As the data of Figure 5 
illustrate, when both interactions occur, there 
is automatically an effect on the T(A: B) term, 
so that the equation is 


T(AB:AB)=Ty(A:A) +74 (B: B) —T(À : B). 


It is, in general, impossible to manipulate both 
the 4 and 2 interactions, while maintaining a 
zero TAn(À: B) term, without unbalancing the 
response ratios A 1B/A Be and 455, AB in 
which case T(A:B) » 0. Looking at it the 
other way, the confounding implies that it is 
impossible fully to resolve an observed effect 
into one or the other category, 
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C3: Nonzero Interactive Bias 


An interactive bias effect occurs when the 
submatrices (4 : 4) 5; and {4:4} is have differ- 
ent row (A) probability totals. It is impossible 
to do this without at the same time causing à 
components interaction and a response bias 
effect. 

The three effects, interaction, response bias, 
and interactive bias, have a hierarchical de- 
pendency structure. A given interaction can be 
varied holding 7(4:B) and T(A : B) constant; 
interaction and response bias can be varied in- 
dependently of T(A:B); but, in the other di- 
rection, a T(4:B) > 0 implies both an inter- 
action and bias effect, and a T(A:B) effect 
implies a component interaction. These results 
hold for the correlation effect held constant. 


Ca: Correlation 


By definition the a;n (A : B) correlation can 
be varied independently of the marginal totals 
in the (4:B)4,5, matrices, within the limits 
imposed by these totals. Since the interaction 
and interactive bias effects are defined by these 
totals, the correlation effects can be varied for 
fixed values of Tn(A:A) and T(A:B). The 
relation. between Tan(A:B) and T(A:B) is 
more complex. As pointed out before, T (À:B) 
is calculated on the sum of the {A:By ain, ma- 
trices, while 745 (À :B) is the weighted sum of 
the T values for each matrix, It is entirely 
possible to have, for example, a zero correla- 
tion in each matrix Tan(A:B) = 0, while the 
T(A:B) term is positive. This will be the case 
generally if the 41/4 s or B,/Bz ratios are differ- 
ent from one submatrix to another. In general, 
then, varying Tan(A:B) will also result in à 
change in T(À:B). This is illustrated in the 
data of Figure 8, where only the correlation 
effects are varied, all interactions fixed at zero- 
In Figure 8a both T45(À: B) and T(A:B) are 
zero. In Figures 8b and 8c both are greater than 
zero. The data of Figure 8b show a maximum 
positive correlation, subject to the restrictions 
on the marginal totals of the GL: B)aun; ma- 
trices; in Figure 8c the correlation is maximally 
negative subject to the same restrictions. 


Summary of Analysis 


"| H P P s H jon 
The total bivariate information transmission 
has been analyzed into two classes and fou 


pug e 
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a. Bidimensional series: No interaction or correlation effects; 
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T(AB: AB) - T(AzÀ) -T(B: B) = 124510+.397313 


AB, AB: 


AB, AB: 


AB: | 180 | .040 
A,B: | .020 | .160 
A.B, | .045 | .010 
AB; | .005 | .040 


.135 | .030 .220 
.015 | .120 180 


T(AB:AB) = .521824 


b. Bidimensional series: No interaction effects, correlation (posi- 
tive) + response bias term added; T(AB:AB) = T(A:À) 
+ T(B:B) + Ta&(A:B) — T(A:B) = .124510 + 397313 


++ .185702 — .060022 


AiBi ABs 
.200 | .050 
.000 | .150 
.025 | .000 
.025 | .050 


AB, A2B: 

.100 | .050 | .400 
.000 | .050 | .200 
.128 | .000 | .150 
.025 | 150 | .250 


T(AB:AB) — .647505 


c. Bidimensional series: No interaction effects, correlation. (nega- 
tive) + response bias term added; T(AB:AB) — T(A:À) 
+ 7(B:B) + Tas(A:B) — T(A:B) = 124510 + .397313 


+ .267877 — .080340 


AsB, ABs 


AB, AB: 
.175 | .000 
.025 | .200 
.050 | .050 
.000 | .000 


.075 | .000 -250 
025 | .100 .350 
150 | .050 .300 
.000 | .100 .100 


T(AB:AB) — .709360 


Fic. 8. Stimulus-response matrices showing correlation effects. 


specific effects. Components interaction and 
interactive bias belong to the class of inter- 
action; T4n(A:B) to the class of correlation. 
T(À:B) has characteristics of both classes. 
While T45(À : B) and the interaction terms can 
vary independently over some range, a com- 
ponent interaction in both 4 and B on the one 
hand, and any correlation effect on the other, 
implies a corresponding effect on the T(A:B). 


CONCLUSIONS 


Some concluding remarks about the implica- 
tions of the above analysis are in order. To 


begin, the analysis as presented is directly ap- 
plicable to the case of more than two values 
per dimension. Also, in principle, a parallel in- 
formational analysis is possible for the case of 
three or even more dimensions. One attractive 
feature of the informational analysis that 
should not be overlooked is its nonmetric re- 
quirements (Tversky, 1967). 

A precise definition and an explicit procedure 
for evaluating the assumption of perceptual 
independence is now possible. Perceptual inde- 
pendence is the special case in which some or 
all of the effects defined above are zero. If one 
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proceeds, as did Eriksen and Hake (1955) and 
Lockhead (1966), to predict bidimensional re- 
sults from unidimensional results, then a gen- 
eral dimensional interaction as well as all other 
interaction and correlation effects are assumed 
nonexistent. Since Corcoran (1966), on the 
other hand, did not concern himself with the 
relationship of performance on the unidimen- 
sional stimulus sets to that on the bidimensional 
set, he would be uninterested in the general di- 
mensional interaction. A second aspect of the 
assumption of perceptual independence is re- 
flected in the Eriksen and Hake study. They 
used the identification responses to stimuli in 
single-dimension sets to predict performance 
on a bidimensional set which was actually a 
subset of the orthogonal set. On the assumption 
of independence they generated a hypothetical 
S-R matrix for the orthogonal set and then ap- 
plied a combinatorial algorithm to these data 
in order to predict performance on the subset 
(they used the correlated or redundant sub- 
set). As outlined above, all except one of the 
analyses for the effects of combining dimensions 
are conducted on the empirically determined, 
orthogonal bidimensional S-R matrix. Thus, 
the significance of the orthogonal bidimensional 
S-R data matrix should not be overlooked. 

The analysis based on the full orthogonal 
multidimensional stimulus set can readily be 
extended to generate predictions for identi- 
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fication responses in any arbitrary multidi- 
mensional subset subject to assumptions about 
the perceptual processes involved. This per- 
mits, at least in principle, an experimental 
analvsis of the "scanning" and "storage" 
mechanisms which determine the distinctive- 
ness of stimuli. 
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TOKEN REINFORCEMENT PRO 


GRAMS IN THE CLASSROOM: 


A REVIEW? 


K. DANIEL O'LEARY ? axb RONALD DRABMAN 


State University of New York, Stony Brook 


Although token reinforcement programs began less than a decade ago, their 
use in classrooms has grown rapidly in popularity as a therapeutic procedure. 
These programs have demonstrated effectiveness in changing the academic and 
social behavior of very diverse child populations. However, the use of token 
and backup reinforcement is but one procedure within a complex constellation 
of factors in the overall token reinforcement program. A number of such fac- 
tors are examined which may critically influence the success of a token program 
including the teacher, the child, the parent, and the system of reinforcement, 
Methodological considerations such as type of experimental design, observer 
bias, and replicability are discussed. There are several methodological problems 
which should be addressed in token reinforcement studies, but because of 
the powerful nature of a token reinforcement program, the generally posi- 
tive results reported thus far will probably withstand stringent methodological 
tests, On the other hand, the long-term effectiveness of such programs has only 
begun to receive attention, and a number of suggestions are made to achieve 


such effectiveness, 


Although token reinforcement programs in 
classrooms are a relatively new phenomenon, 
the idea that children should be rewarded for 
good behavior is certainly not a twentieth 
century innovation. Prizes such as nuts, figs, 
and honey were used to reward academic 
achievement in twelfth century teaching of 
the Torah (Birnbaum, 1962). In 1529, Eras- 
mus advocated cherries and cakes in place of 
the cane in teaching children Latin and Greek 
(Skinner, 1966). In England in the early 
nineteenth century, Lancaster gave pictures 
to children who were promoted (Curtis & 
Boultwood, 1960). Teachers have long used 
stars for academic achievement, and Sunday 
School teachers continue to award medals for 
Perfect attendance, but the systematic dis- 
tribution of prizes or rewards on a frequent 
basis in the classroom has not been seen 
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until very recently. Before Staats developed 
his token program in 1961 with children 
(Staats, Staats, Schutz, & Wolf, 1962) and 
Ayllon and Azrin launched their token rein- 
forcement program in 1961 with adult psy- 
chiatric patients (Ayllon & Azrin, 1968), few 
token reinforcement programs existed any- 
where. However, in less than a decade, at least 
100 token reinforcement programs have been 
established in this country, and many of these 
programs have been in classrooms. Because of 
the very rapid increase in token programs in 
classrooms, and because of the complexities 
of token programs which are unique to class- 
rooms, this review focuses almost solely on 
classroom token programs. The present review 
(a) examines the development of token pro- 
grams, (5) evaluates the effectiveness of 
token programs designed to influence the be- 
havior of all children in a class, (c) examines 
some methodological problems often associ- 
ated with classroom token programs, and (d) 
reviews generalization research in token pro- 
grams and presents some suggestions for ob- 
taining generalization. 


DEVELOPMENT OF TOKEN PROGRAMS 


The basic ingredients of a token reinforce- 
ment program usually include (@) a set of 
instructions to the class about the behaviors 
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that will be reinforced, (b) a means of mak- 
ing a potentially reinforcing stimulus—usually 
called a token—contingent upon behavior, 
and (c) a set of rules governing the exchange 
of tokens for backup reinforcers such as 
prizes or opportunities to engage in special 
activities. The token is simply a stimulus, 
like a plastic chip or a numerical rating, which 
“stands for something" and which is ex- 
changeable for certain desired items or ac- 
tivities. Money is clearly the most important 
token in our society, and there are rules 
governing its acquisition, distribution, and 
exchange. However, the variables governing 
man’s economic behavior range far beyond 
the simple laws of monetary acquisition, dis- 
tribution, and exchange. Similarly, the suc- 
cesses and failures of classroom token rein- 
forcement programs point to the complex fac- 
tors other than distribution and exchange of 
tokens such as shaping, teacher praise, and 
reprimands which may influence the effective- 
ness of a token program. The variables op- 
erating in a token program are examined in 
detail later, but first several token studies 
with animals are presented in order to look 
at the origins of token programs. 

In classic experiments, Wolfe (1936) and 
Cowles (1937) taught chimps to place chips 
or tokens in a slot to obtain grapes. Next, the 
chimps were taught to press a bar in order 
to obtain a chip. Finally, the animals were 
taught to obtain a specified number of chips 
before exchanging the chips for grapes or to 
wait a specified interval before exchange was 
possible. Wolfe and Cowles found that the 
chimps were able to learn a weight-lifting task 
with only poker chips as reinforcers, and thus 
they established that the tokens acquired 
secondary reinforcing properties. Smith (1939) 
and Kelleher (1958) further demonstrated 
that chimps would learn tasks when tokens 
which were exchangeable for food were made 
contingent upon correct responses. 

Following the general paradigms of the 
classic animal] studies, the initial token ex- 
periments with children were designed to as- 
sess whether tokens acquired secondary ne: 
inforcing value and to see whether children's 
behavior could be maintained over long pe- 
riods of time utilizing token reinforcement. 
Assumedly, by pairing a neutral simulus such 
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aS a chip with one primary reinforcer, the 
chip acquires secondary reinforcing properties. 
By pairing a neutral stimulus with more than 
one primary reinforcer, a chip becomes a gen- 
eralized reinforcer which will maintain its 
reinforcing value in spite of fluctuating de- 
privational levels for single reinforcers. Ex- 
periments by Meyers (1960) and Meyers, 
Craig, and Meyers (1961) suggested that 
tokens can be established as secondary rein- 
forcers by shaping and maintaining the be- 
havior of children at an experimental task 
where tokens were exchangeable for candy. 
However, as Bijou and Baer (1966) noted, 
the experiments by Meyers and Meyers et al. 
used the same response in training and extinc- 
tion and thus “the results may be attributed 
to the reinforcing and/or discriminative value 
of the tokens [p. 778].” In short, the sec- 
ondary reinforcing function of tokens was 
not unequivocally demonstrated in the labora- 
tory studies of Meyers and Meyers et al. with 
children. Furthermore, in neither of these 
studies was a generalized reinforcer estab- 
lished; the tokens were exchangeable for only 
one item. Staats, Finley, Minke, Wolf, and 
Brooks (1964) were one of the first groups to 
establish an extensive reinforcing System in 
which tokens were exchangeable for a wide 
variety of edibles and toys. A child selected a 
toy for which he would “work” before be- 
ginning a training program. These experiments 
demonstrated that a token reinforcement sys- 
tem could maintain reading behavior of 4-year- 
old children for long periods of time. The ex- 
periment by Staats et al. (1964) was par- 
ticularly significant because it demonstrated 
that with a token System and a variety of 
exchange items one is no longer dependent 
upon the power of a single backup reinforcer. 
That is, one is not limited to giving M&M 
candies whose power depends upon the mo- 
mentary deprivation state of the child. In- 
stead, the only limitation of backup rein- 
forcer systems is the ingenuity of the experi- 
menter. 

Since 1964 numerous classroom applications 
of token reinforcement programs have emerged 
with extensive backup reinforcement systems. 
Such programs were usually designed to im- 
prove social and academic behaviors of chil- 
dren who were only minimally influenced by 
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normal classroom reinforcers such as stars, 
grades, and teacher attention. Although some 
token programs have utilized only one backup 
reinforcer such as candy (Quay, Sprague, 
Werry, & McQueen, 1967) most classroom 
token programs have extensive backup rein- 
forcer systems, and this review deals almost 
solely with those programs which have had 
more than one backup reinforcer. 

A basic assumption in classroom token pro- 
grams is that tokens will acquire reinforcing 
value by association with a variety of backup 
reinforcers. By having a large variety of 
backup reinforcers, it is likely that at least 
one prize will be desired by each child at any 
time, It is also thought that the continual 
pairing of a teacher's praise with the token 
and backup reinforcers will result in the en- 
hancement of the teacher's praise as a posi- 
tive reinforcing stimulus. Because of the 
assumed enhancement of the teacher's rein- 
forcing value and increases in the children's 
academic and social repertoires, it often has 
been assumed that a token reinforcement sys- 
tem can gradually be moved without a major 
loss of appropriate behavior. 


EFFECTIVENESS OF TOKEN PROGRAMS 


Various classes of behavior have been used 
as dependent measures in token programs. 
The effectiveness of token programs is evalu- 
ated with respect to their probability of 
modifying four broad classes of behavior: (a) 
decreases in disruptive behavior, (5) increases 
in study behavior, (c) increases in academic 
achievement, and (d) changes in other be- 
haviors not selected as primary targets for 
remediation but which may change as à func- 
tion of the token program, for instance, at- 
tendance and bartering. While changes in any 
one of these classes of behavior may be cor- 
related with changes in the other classes, this 
discussion is based on the primary measures 
used by the investigators. Critical commentary 
9n design and methodology is withheld until 
the later portion of this paper. Unless men- 
tioned otherwise, tokens were always ex- 
Changeable for backup reinforcers. oe 

Most token programs have utilized within- 
Subject designs which were characterized by 
ABAB designs or variants thereof. Basically 
the ABAB design involves a pretreatment or 
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base period, a treatment or token period, a 
return to the base period conditions, and 
finally a reinstatement of the treatment or ex- 
perimental conditions (Bijou, Peterson, Har- 
ris, Allen, & Johnston, 1969). Evaluations of 
classroom behavior were typically obtained 
from observers who were in the class through- 
out the study and who noted the frequency 
of various behaviors according to a specified 
observer code. Reliabilities of the classroom 
observations were usually obtained by hav- 
ing two observers simultaneously record the 
occurrence of specific behavior of one child 
during a specified time and then analyze the 
extent of observer agreement. Evaluations of 
academic behavior were generally obtained 
from standardized tests. The various types of 
tokens, methods of making tokens contingent 
upon appropriate behavior, and kinds of 
backup reinforcers are discussed later in the 
section on variables influencing the effective- 
ness of token programs. 


Decreases in Disruptive Behavior 


The first use of a token reinforcement 
program to control a large class (V = 17) of 
emotionally disturbed children was by O’Leary 
and Becker (1967). A pad was put on each 
child’s desk in which the teacher placed a 
rating every 20 minutes. The ratings were ex- 
changeable for backup reinforcers, which were 
initially available every day. The introduction 
of the token program resulted in a decrease 
in average disruptive behavior (talking, noise, 
pushing, eating) from 76% in the base period 
to an average of 10% during the 2-month 
token period. Delay of backup reinforcement 
was gradually increased to 4 days without an 
increase in disruptive behavior. The program 
was equally successful for all children ob- 
served, and anecdotal evidence suggested that 
the children’s appropriate behavior general- 
ized to other school situations. 

More recently, O'Leary, Becker, Evans, and 
Saudargas (1969) observed the behavior of 
seven disruptive children in a second-grade 
class of 21 children for 8 months. During the 
6-week base period, the teacher was asked to 
handle disruptive behavior in whatever way 
she felt appropriate. For the following 3 
weeks (Rules), the teacher placed a set of 
rules on the blackboard and reviewed these 


rules with the class at least twice a day. Dur- 
ing the next phase (Structure), the teacher 
was asked to reorganize her aíternoon aca- 
demic program into four 4-hour sessions. 
Next, for 2 weeks, in addition to continuing 
Rules and Structure, she was asked to praise 
appropriate behavior and to ignore disruptive 
behavior (Praise and Ignore). Rules, Struc- 
ture, and Praise and Ignore remained in ef- 
fect throughout the rest of the study. After 
2 weeks of the Praise-and-Ignore condition, 
the token program (Token I) began in which 
the children received ratings each afternoon 
which were exchangeable for backup rein- 
forcers. After 5 weeks, the token program 
was withdrawn (Withdrawal). The token pro- 
gram was then reinstated for a 2-week period 
(Token II), and finally Token II was re- 
placed by a form of a token program in 
which children received stars exchangeable 
for one or two pieces of candy per week, ac- 
cording to the quality of their own behavior 
as well as the behavior of their peers (boys 
versus girls). There were some individual dif- 
ferences in the reactions of children to the 
various phases of the study, but most im- 
portantly, Rules, Structure, and Praise and 
Ignore generally were not effective in reduc- 
ing disruptive behavior. On the other hand, 
the token program (which consisted of Rules, 
Structure, Praise and Ignore, Tokens, and 
Backup Reinforcers) and the variations of 
the token program which utilized peer com- 
petition were effective in reducing disruptive 
behavior. The effects of the afternoon token 
program did not generalize to the morning 
when the token program was not in operation. 

Meichenbaum, Bowers, and Ross (1968) 
used money receipts as tokens with 10 in- 
stitutionalized female adolescent offenders. 
Their dependent variable was inappropriate 
classroom behavior. Behavior during the after- 
noon token period was appreciably better than 
during a base period, but the girls actually 
became worse during the mornings when the 
token program was not in effect. The girls 
manipulated the experimenters into expand- 
ing the program to the morning with com- 
ments such as, “If you don't pay us, we won't 
shape up |p. 349].” With the consequent 
initiation of the token program in the morn- 
ing, the behavior of the girls improved in the 
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morning. In brief, different investigators re- 
peatedly reported significant decreases in 
disruptive behavior associated with token 
programs (Kuypers, Becker, & O’Leary, 1968; 
Martin, Burkholder, Rosenthal, Tharpe, & 
Thorne, 1968). 


Increases in Study Behavior 


Bushell, Wrobel, and Michaelis (1968) 
evaluated the effect of contingent and non- 
contingent special events on the study be- 
havior of 12 preschool children with above 
average intelligence. The records of all 12 
students indicated that noncontingent rein- 
forcement was less effective in sustaining 
study behavior than contingent reinforcement. 

Walker, Mattson, and Buckley (1969) de- 
vised a treatment program for six “hyperac- 
tive, disruptive, and acting-out" fourth, fifth, 
and sixth graders with average or above in- 
telligence. Two children were brought into 
this class at a time, until there was a total 
of six children in the class. This method of 
introducing children into a treatment setting 
was called "staging" by Walker et al. To en- 
hance the effectiveness of their treatment, a 
number of procedures were introduced simul- 
taneously, such as programmed instruction, 
charts kept by the children of their points 
earned, time-out from reinforcement, group 
points for appropriate behavior, and parental 
involvement. The six children in the program 
increased their proportion of task-oriented be- 
havior from an average of 39% in the base 
period in the regular class to 90% in the 
token program in the special class. The six 
children returned to their regular classes from 
2:00 to 3:00 each day, and by the end of the 
fourth week of the treatment program, the 
behavior of all six subjects was “indistinguish- 
able within two settings [p. 69]," with 
task-oriented behavior averaging approxi- 
mately 9096. Following the token program in 
the special class, observations were made of 
the children’s “posttreatment” task-oriented 
behavior in their regular classes, and at the 
end of the 3 months their task-oriented be- 
havior was at 72% (range 39%-97%) of 
what it had been during 2.5 months of treat- 
ment in the experimental class. 

Broden, Hall, Dunlap, and Clark (1970) 
established a token program in a class of ! 
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seventh- and eighth-grade students who were 
all several years behind in at least one aca- 
demic subject and who displayed disruptive 
classroom behavior. Base-line data were ob- 
tained during a general reading class—one of 
the five periods during the day. The rate of 
study behavior during this base period was 
29%. Making social reinforcement contingent 
upon study behavior increased the study rate 
to 57%. Next, a simplified token program 
was begun in which pupils received points 
worth 1 extra minute of lunch if they were 
in their seats and quiet when a timer sounded. 
As a result of the use of general timer pro- 
cedures, the study behavior rose to 74%. 
However, the increase in study rate did not 
generalize to nontoken periods of the day. 
Consequently, a token program involving gain 
and loss of points was initiated throughout the 
day, and study behavior during the whole 
day averaged above 80%. Withdrawals of 
the two different token programs indicated 
that the programs were functionally related 
to the improved behavior. 


Increases in Academic Achievement 


Birnbrauer, Wolf, Kidder, and Tague 
(1965) monitored the number of items com- 
pleted and the percentage of errors of 15 
retarded children in a token reinforcement 
program of a programmed instruction class- 
room, The pupils were in the token program 
for at least 2 months, and relatively high 
levels of accuracy and study were obtained. 
The token program was then withdrawn for 
approximately 1 month and subsequently re- 
instated. During the no-token period, three 
general patterns of results were obtained: (a) 
3 of the 15 pupils showed no measurable 
Change in performance: (b) 6 pupils increased 
either markedly in overall percentage of er- 
Tors or sufficiently to reduce progress in the 
Programs; (c) 4 pupils showed an increase in 
Percentage of errors and a decline (or con- 
Siderable variability) in amount of studying. 
With the reinstatement of the token program, 
high levels of accuracy and rates of studying 
Were obtained. 

In one of the largest token reinforcement 
Studies, Hewett, Taylor, and Artuso (1969) 
formed six classrooms of 8- to 11-year-old 
emotionally disturbed children. There were 
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nine students per class and the classes were 
matched for IQ, age, reading, and achieve- 
ment level. One class (E) received tokens 
for the entire year. Another class (C) served 
as a control and received no tokens for the 
entire year. Two more classes (CE) had con- 
trol procedures the first semester and tokens 
the second semester. Finally, two classes 
(EC) received tokens the first semester 
and control procedures the second semester. 
The three dependent measures were arithmetic 
achievement (California Achievement Test, 
CAT), reading achievement (CAT), and task 
attention, There was greater improvement in 
arithmetic and task attention in Class E than 
in Class C. The two CE classes showed 
greater improvement in arithmetic and task 
attention during Semester 2 than did Class C. 
However, the two EC classes showed a sig- 
nificant increase in task attention when tokens 
were withdrawn when compared with Class 
E. The removal of tokens was not associated 
with a change in reading or arithmetic. It is 
not clear what accounted for the increase in 
task attention following the removal of the 
token program, but such a finding does lend 
support to the notion that children do not 
become totally dependent upon backup rein- 
forcers. On the other hand, one might also 
conclude that task attention was suppressed 
in the EC classes during the token program. 
For example, it is possible that certain aspects 
of the token program were executed in a 
manner that prompted attention to the me- 
chanics of the token system instead of aca- 
demic work per se. Then when the token 
program was removed, the children may 
simply have had fewer distractions from their 
academic work. It is also possible that the 
particular type of token program implemented 
by Hewett et al. was so difficult for the teach- 
ers to implement that they became overcon- 
cerned with dispensing tokens, failed to shape 
the children’s behavior effectively, and paid 
little attention to the academic program per 
se. (A later section of this paper deals with 
the various means of delivering token and 
backup reinforcers.) It should be emphasized 
that the control conditions included the use 
of “verbal praise, complimentary written com- 
ments on completed assignments, and award- 
ing privileges for good work [Hewett et al. 
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1969, p. 525]." In addition, the six class- 
rooms were visited weekly by the three in- 
vestigators who gave teachers suggestions 
dealing with problem children. Since so many 
variables other than token and backup rein- 
forcers influence children's behavior, one can 
only speculate about the basis for the increase 
in task attention in the EC classes. The EC 
results of Hewett et al. stand in marked con- 
trast to the results of many other investigators 
that show that unless steps are taken to 
maintain appropriate behavior as a token 
system is withdrawn, appropriate behavior 
generally declines. 

Significant improvements in academic be- 
havior of children in special remedial classes 
have been reported by Clark, Lachowicz, and 
Wolf (1968) and Wolf, Giles, and Hall 
(1968). The latter investigators had a special 
remedial education program for 15 fifth- and 
sixth-grade children in an urban poverty area. 
The Stanford Achievement Test scores of the 
children in the token program increased 1.5 
years as compared to a median gain of O.8 
years for a control group (W = 15) who had 
no remedial program. The token group showed 
a median increase of 1.1 grade points (report 
card grades) while the entire group increased 
only 0.2 points. 

Significant changes in the academic be- 
havior of delinquents have been reported by 
Tyler and Brown (1968) and Cohen; 
Cohen devised an environment for learning 
which encompassed the total day for 28 ju- 
venile offenders who were all school failures; 
85% of these offenders were school dropouts. 
They had an average penal sentence of 2.5 
years. Each student (offender) became a 
student educational researcher and worked 
on approximately 140 programmed education 
courses and in 18 programmed classes. The 
specially designed environment was full of 
choices not ordinarily available to a prisoner, 
such as money, private bedroom, and gifts, 
The students “payed” for such choices by 
performing on tests with 90% accuracy for 
which they received tokens. Poor test per- 
formance meant that the student went “on 


? Cohen, H. L. Motivationally oriented designs for 
an ecology of learning. Paper presented to American 
Educational Research Association, New York, New 
York, February 1967. 
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relief," slept on an open bunk, and ate from 
a metal tray. In 90 hours of academic work, 
the average gain of the students was 1.9 grade 
levels on the Stanford Achievement Test and 
2.7 grade levels on the Gates Reading Survey. 

The studies reviewed here show that a token 
reinforcement program will significantly in- 
crease desired behaviors of a wide variety of 
students. However, a detailed examination of 
the behavior of individual children reveals 
that some children fail to change with the 
introduction of the token program (Kuypers 
et al., 1968; Zimmerman, Zimmerman, & Rus- 
sell, 1969). Nonetheless, one should not con- 
clude that such children's behavior could not 
be influenced by a token program; the in- 
vestigator may not have known or had con- 
trol of the appropriate variables. As Zimmer- 
man et al. (1969) noted, a “token reinforce- 
ment program . . . is indeed a complex set of 
procedures which demands of a teacher much 
attention and includes vast numbers of stimu- 
lus elements and environmental variables [p. 
11]." An analysis of some of this complexity 
appears in the section on variables influencing 
token programs. 

A detailed examination of those behaviors 
most influenced by the token programs and 
those behaviors least influenced by token pro- 
grams is almost impossible. Investigators fre- 
quently use very broad categories of behavior 
such as study behavior, appropriate behavior, 
or disruptive behavior, The frequencies of 
individual classes or subcategories of behavior 
have usually not been reported. Part of the 
reason for the absence of such detail may be 
the investigators desire to conserve space in 
the description of their research. More often, 
however, many investigators simply may not 
have adequately defined the various behaviors 
which comprise the more general behavior 
categories, and consequently there is no re- 
port of the subcategories, their respective re- 
liabilities, and their change or absence of 
change. In one notable exception to this pro- 
cedure, Thomas, Becker, and Armstrong 
(1968) reported subcategories of classroom 
behavior and their respective change in 2 
study concerning the effects of contingent 
teacher attention, However, in the token 
Studies such data do not exist, and one ca? 
only speculate about such behavior changes- 
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It appears that behaviors such as getting 
Out of one's seat, talking out of turn, and 
turning around in one's seat are most likely 
to change with the introduction of the token 
program, whereas academic behavior would 
be most difficult to change. In fact, the most 
Obvious dramatic changes in token programs 
seem to have occurred in programs where non- 
academic behaviors served as the dependent 
variable. If token programs serve as a priming 
or incentive function, one would certainly ex- 
pect academic behaviors to be more difficult 
to change than social behaviors, since children 
in token programs frequently have the ap- 
propriate social behaviors in their repertoire 
but not the academic skills necessary to pro- 
gress without considerable instruction. 


Changes in Other Behavior 


O'Leary et al. (1969) found that attend- 
ance appeared to be enhanced during the 
token phases of their study, and Wolf et al. 
(1968) found that, without exception, chil- 
dren voted to have a remedial token program 
On regular school holidays. In addition, young 
children may learn certain rules of exchange 
in the token economy and thus learn some 
form of bartering and trade.' In fact, Bushell 
et al. (1968) observed preschool children 
lending tokens at interest. Children may learn 
lo become manipulative by modeling the 
teacher's if-then statements. For example, 
Children might try to change other children's 
chavior by saying, “If you clean that mess 
up, Pl give you my candy bar." Obviously 
the extent to which a child models a teacher's 
control” procedures in a token classroom is 
amenable to investigation, but even if model- 
ing of the teacher’s behavior did occur, such 
methods of peer control might be more ac- 
Ceptable than the more frequent methods of 
Coercion and aversive control displayed by 
Many disruptive children. However, to main- 
ps appropriate behavior as a token program 

withdrawn, prompting with if-then state- 
an excellent op- 
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Ne experiment. 


385 


ments should be minimized since the natural 
environment will neither incessantly prompt 
behaviors nor reinforce them with “arbitrary” 
(Ferster, 1967) reinforcers. 

One of the most frequent criticisms of a 
token program is that children will acquire 
the expectation that they should always re- 
ceive tangible reinforcers for any work they 
do. It seems quite possible that if a child 
were in a token program for 3 or 4 years, 
where tangible reinforcers were delivered on 
a frequent basis, and if the token program 
were suddenly stopped, the child would stop 
working, get angry, and ask for a reward for 
work he had done. However, this problem 
can be minimized by withdrawing the token 
program as quickly as possible and emphasiz- 
ing reinforcers which are intrinsic to any 
classroom, such as recess, privileges, and free 
time. 

The preceding outcome evaluation docu- 
ments the effectiveness of token reinforcement 
programs in producing behavioral change in 
diverse child populations. Indeed, the po- 
tential power of token programs seems to be 
partly indicated by the increasing number of 
schools and institutions initiating such pro- 
grams. Nonetheless, almost any researcher 
working with token programs can cite at least 
one example of a token program which failed 
in one way or another—often because of in- 
adequate supervision. Unless studies are ex- 
tremely well designed, it is difficult to draw 
conclusions from studies which show no dif- 
ferences between groups, and as a result fail- 
ures are infrequently published unless they 
are nonreplications. In fact, the present au- 
thors have not been able to find one published 
study that failed to find an effect of the token 
program when the token program was com- 
pared to a base period that did not contain 
a token program. However, the Hewett et 
al. (1969) study discussed previously might 
be considered a partial failure. A study by 
Kuypers et al. (1968) was a partial failure, 
and a token program of the senior author * 
failed to effect changes in the behavior of two 
disruptive children in a very unruly class. 
Since such failures do occur it is beneficial to 

sO'Leary, K. D. & Lima, P. A token system 
failure. Unpublished manuscript, State University of 
New York, Stony Brook, N. Y.. 1968. 
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look at some of the variables operating within 
a token setting. 


VARIABLES INFLUENCING THE EFFECTIVE 
or TOKEN PROGRAMS 


Teacher 


Praise and ignoring. The effects of a token 
program may be influenced greatly by the 
frequency of a teacher’s praise. It has been 
shown repeatedly that teacher praise is an 
effective method of reducing disruptive be- 
havior (Hall, Lund, & Jackson, 1968; Mad- 
sen, Becker, & Thomas, 1968), and in a token 
study in which a teacher was not instructed 
concerning praise and shaping, some children 
were only minimally influenced by the token 
program, and consequently it was terminated 
prematurely (Kuypers et al., 1968). It can- 
not be emphasized strongly enough that praise 
should be given for approximations to the de- 
sired terminal response. It is folly to expect 
all children in a heterogenously grouped class 
to reach some absolute level of appropriate 
behavior before praise is given—even when 
the children are receiving tokens. 
Praising appropriate behavior has ordinarily 
been used in conjunction with ignoring most 
disruptive behavior. However, where there is 
a great deal of peer reinforcement of disrup- 
tive behavior, ignoring disruptive behavior 
may prove deleterious (O'Leary et al., 1969). 
Walker et al. (1969) provided evidence that 
when time-out from reinforcement and suspen- 
sion from school for extreme disruption were 
eliminated from a token program, disruptive 
behavior increased. One method which has 
been shown effective in reducing disruption in 
many young children is to reprimand a child 
so quietly that most of the other children in 
the class cannot hear the reprimand (O’Leary 
& Becker, 1968; O’Leary, Kaufman, Kass, 
& Drabman, 1970). Conversely, Thomas, 
Becker, and Armstrong (1968) provided evi- 
dence that suggests that frequent criticism of 
disruptive behavior, which probably was au- 
dible to at least several children, led to in- 
creases in disruptive behavior. Probably, low 
rates of general disapproval, soft reprimands, 
and high rates of praise would effect the most 
marked changes in behavior. 
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Reinforcing value of teacher. As mentioned 
in the introduction, it is assumed by some re- 
searchers that having a teacher praise children 
when she dispenses token and backup rein- 
forcers will help her acquire conditioned re- 
inforcing value. This assumption has not been 
tested in a classroom, but it has been demon- 
strated that a social stimulus can acquire re- 
inforcing properties for children when paired 
with food (Lovaas, Freitag, Kinder, Ruben- 
stein, Schaeffer, & Simmons, 1966; Perkins, 
1967). 

Teacher expectations. The effect of teacher 
expectancies on children's IQ scores was ex- 
pounded by Rosenthal and Jacobsen (1966, 
1968) who reported that simply telling à 
teacher that a child had potential for gifted- 
ness was related to a large IQ gains in 1 year 
—particularly in the first and second grades. 
Despite widespread popular reactions to this 
research and its consequent report in book 
form, the initial excitement generated by this 
research may be unwarranted (Snow, 1969). 
Devasting critiques of Rosenthal and Jacob- 
sen's methodology have appeared recently 
(Clairborn, 1969; Snow, 1969; Thorndike, 
1969), and a number of studies have failed 
to find changes in total IQ scores related to 
teacher expectancies (Anderson & Rosenthal, 
1968; Clairborn, 1969), In fact, some investi- 
gators have found significant losses in various 
types of IQ scores related to teacher ex- 
pectancies (Anderson & Rosenthal, 1968). To 
regard the expectancy effect as a potent de- 
terminer of a child’s classroom behavior seems 
unwarranted. At most, the evidence demon- 
strating such an effect is equivocal (Clair- 
born, 1969), and the teacher’s behavioral 
changes that result from receiving a “false” 
expectation are probably quite variable (And- 
erson & Rosenthal, 1968; Beez, 1968; Clair- 
born, 1969; Meichenbaum, Bowers, & Ross: 
1969). Nonetheless, a slightly different tyP® 
of expectancy effect than that reported bY 
Rosenthal may be important for implementers 
of token programs. Several anecdotal €** 
amples suggest that where a teacher does not 
believe that a token system will be effective: 
her success will be minimized." Thus, !*€ 
searchers and administrators should be Cà" 

"Paul Graubard, Yeshiva z 
communication, February 1969, 
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tious in forcing teachers to execute programs 
that they feel, and possibly hope, will fail. 

Teacher training. There has been no precise 
description of the training necessary for a 
teacher to implement a token program suc- 
cessfully, and the number of hours of psy- 
chological consulting time has not been speci- 
fied in reports of token programs. It has been 
demonstrated that a teacher can successfully 
reduce disruptive behavior in a classroom with 
a token program without being enrolled in a 
Course emphasizing learning principles (Kuy- 
pers et al., 1968; O’Leary et al., 1969). How- 
ever, in these studies there was consultation 
from the investigators. Some token programs 
have provided a source of reinforcement to 
the teacher such as a consulting fee, graduate- 
Course. credit, and frequent visits from a 
Principal—carefully designed to show ap- 
Proval of the teacher's efforts. In parent- 
training programs, Ray (1968) found that 
Only à modest proportion of parents actually 
Changed their behavior with their children 
after successfully completing a programmed 
text dealing with child management. Assum- 
ing that teacher training has some of the 
Same problems in translating verbal behavior 
Into appropriate action, efficient teacher train- 
ing would probably consist of (2) having the 
leacher record her behavior as well as the 
behavior of her class; (b) having the teacher 
Observe videotapes of effective teacher be- 
Avior, or having the experimenter model the 
desired teacher behavior *; and (c) having 
direct feedback for the teacher in the class- 
oom or in a conference about her classroom 
behavior, 

Token reinforcement: 
ack? A teacher in a token program fre- 
quently describes the appropriateness or in- 
"Dpropriateness of a child's behavior as she 
Hives the child tokens, ratings, or praise: 
vo nsequently, this feedback may be a critical 
ariable in the reduction of disruptive be- 
d O'Leary et al. (1969) found that in- 

asing the frequency of praise was not 
und that having 
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effective in reducing disruptive behavior. 
Thus, one might imply that simply increasing 
feedback in the form of praise is not always 
sufficient to reduce disruptive behavior. Bu- 
shell et al. (1968) institute a token program 
which involved the use of praise, tokens, and 
backup reinforcers. The teacher gave the child 
a token and said “good” or “that’s right” 
whenever she saw a child behaving appropri- 
ately. After several weeks of this procedure, 
the purchasing power of the tokens was 
eliminated, but the administration of praise 
and tokens remained in effect. When the pur- 
chasing power of tokens was eliminated, study 
behavior declined, and when the purchasing 
power of tokens was reinstated, study be- 
havior increased. Such results suggest that a 
token program serves more than just a feed- 
back function. However, it should be em- 
phasized that there are many ways of pro- 
viding feedback, and some procedures for ad- 
ministering praise and tokens may be as ef- 
fective as the use of praise, token, and backup 
reinforcers. For example, when a teacher pe- 
riodically gives feedback by going to each 
child's desk and evaluating his behavior with 
ratings or tokens that are not exchangeable 
for backup reinforcers, this feedback may be 
an effective procedure for changing children’s 
behavior. This periodic administration of 
feedback may be contrasted with a procedure 
in which the teacher is not required to respond 
regularly to each child but may provide feed- 
back at her own discretion. 


Child 

Number of children. The number of chil- 
dren within token programs in classroom set- 
tings with a single teacher has varied from 
6 to 21.8 Generally the number of children in 
a token program is correlated with the se- 
verity of the behavior problems in the class. 
Since a token program provides additional re- 
inforcement not ordinarily available in the 
classroom, it is likely that a single teacher 
could teach more children effectively with a 


token program than without it. In addition, 


$ Because of the rapid growth of token programs 
and the complexity of processes therein, token pro- 
grams desig ed primarily for one child per classroom 
were not reviewed. Interested persons should consult 
d Ebner (1969). 


wel 
Patterson, Shaw, an 
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it should be noted that disruption may be 
greatly curbed if one starts a token program 
with a small group of children and then adds 
additional children as classroom control is 
effected (Walker et al., 1969). 

Socioeconomic status. No precise specifica- 
tion of socioeconomic status has been made 
in any token study, but most classroom token 
programs have dealt with children who might 
be described variously as (a) institutionalized, 
(6) young, and/or (c) disadvantaged—all 
children who might respond well to tangible 
reinforcers such as candy, toys, and money. 
The authors know of no studies dealing ex- 
clusively with upper-class students in junior 
or senior high school, but presumably an eí- 
fective token program could be devised for 
such students emphasizing factors such as 
prestige, social reinforcement, and special 
privileges. 

Age, IQ, and diagnosis. The studies re- 
viewed in this paper included children with 
IQ levels ranging from the retarded to “above 
average" range, and the children have ranged 
in age from approximately 4 to 21 years. 
Although change has been documented in 
academic and social behavior at all ages, 
there have been more token programs de- 
signed for young children since they probably 
respond to inexpensive backup reinforcers 
more readily than older students. Most token 
reinforcement programs have been conducted 
by experimenters who deemphasize classical 
diagnostic categories and who utilize an as- 
sessment of particular “target” behaviors 
which serve as the focus of the treatment 
(O'Leary, in press). However, it seems safe 
to say that token programs have increased 
academic and prosocial behavior in children 
traditionally seen as hyperactive (Quay et al., 
1967), retarded (Birnbrauer et al, 1965), 
emotionally disturbed (O'Leary & Becker, 
1967), and delinquent ( Tyler & Brown, 1968). 

Predictors of responsiveness to token pro- 
grams. There have been no studies which re- 
port any predictors of responsiveness to token 
programs. The failure to develop such pre- 
dictors on the basis of personality variables 
such as anxiety and compulsiveness might be 
seen by some as a conspicuous gap in the 
literature of token systems. However, per- 

sonality variables of children have failed to 
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offer much predictive utility in most situa- 
tions (O'Leary, in press). More importantly, 
many behaviorally oriented psychologists have 
rejected both traditional personality theory 
and traditional personality measures. The 
absence of predictors and the general avoid- 
ance of traditional personality measures does 
not mean that predictors of responsiveness 
to token programs would not be useful and 
could not be developed within the behavioral 
framework. However, the nature of these- 
predictors would probably differ considerably 
from traditional personality predictors such 
as anxiety and compulsiveness, It may even 
be necessary to use certain assessment tools 
to predict responsiveness to the major token 
program with its frequent token and backup 
reinforcers and to use other assessment tools 
to predict maintenance of appropriate be- 
havior after a token System is withdrawn. 
Similarly, it will be necessary to have differ- 
ent predictors for different target behaviors, 
for instance, academic and social behaviors. 
One of the most obvious and possibly the 
most important variable in predicting changes 
in academic and social behavior during a 
token program is one's academic skill before 
entering a token program. More specifically, 
one would need to know the child's specific 
academic skills that are targets of manipula- 
tion in the token program, for instance, mathe- 
matics skills if the token program is in à 
mathematics lesson or reading skills if the 
token program is in a reading lesson. Knowl- 
edge of a child's pretoken academic behavior 
can be useful in predicting changes in his 
social behavior during the token program, 
since disruptive classroom behavior is often 
the result of academic deficiencies. Second, it 
is also helpful to assess the reinforcers in 4 
child’s natural environment so that one does 
not use backup reinforcers in the token pro- 
gram to which the child has ready access: 
Third, responsiveness to contingent teacher 
attention would probably be a good predictor 
of the level of disruptive behavior in a toke” 
program during any stage of its existence? 
Fourth, the amount. of savings a child h? 
acquired and the number of points à chi Ue 
needs for the prize he desires seem highly xd 
portant in predicting the level of disrupt 
behavior while the token program is in effe 
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Fifth, the initial frequency of a child's disrup- 
tive behavior, the intensity of the disruptive 
behavior, and the extent to which the dis- 
Tuptive behavior is maintained by peers would 
Probably be good predictors of the level of 
disruptive behavior as the token program is 
withdrawn. 

It is possible that with large enough sam- 
ples one could obtain small but significant 
Correlations between responsiveness to token 
Programs and personality variables such as 
Social desirability, locus of control, internaliza- 
tion-externalization, and delay of gratification. 

Owever, the present authors contend that 
Predictive measures such as those mentioned 
ìn the previous paragraph would have equal 
Or greater predictability, and more impor- 
tantly, they would provide more relevant in- 
formation concerning necessary tailoring of 
the token program to account for individual 
differences. Despite opinions to the contrary, 
Operantly oriented treatment approaches seri- 
Ously emphasize certain types of individual 

ifferences as seen most easily by the focus 
on shaping, programmed instruction, and 
lesting stimuli to assure that they have re- 
'nforcing value for individual subjects. None- 
theless, almost no research has been done on 
the interaction between token programs and 
Individual differences, and such research could 
€ of both theoretical and practical relevance. 


Parent 


Most published studies of token programs 
Nave not involved parents—usually for rea- 
Sons of experimental control. However, the 
Use of parents would probably enhance the 
Ong-term effectiveness of token programs, and 
lé involvement of parents, though not ex- 
Perimentally analyzed, has been well illus- 
trated by McKenzie, Clark, Wolf, Kothera, 
‘Md Benson (1968) and by Walker et al. 
(1969). 


Systems of Reinforcement 

Token and backup reinforcers. The types 
pom and backup reinforcers used in class- 
dams have varied considerably, but little d 
dires has been paid to the kinds of token 
Or Packup reinforcers that might be effective 
. Particular populations. Most programs 
“ve used check marks or ratings given by 


the teacher. However, stars, rings, checks on 
a card which a child carries, interlocking 
chips, light flashes, plastic chips, and tags 
have served as tokens. Some programs have 
even dispensed different-colored tokens for 
different behaviors and required different- 
colored tokens for different prizes or backup 
reinforcers (Wolf et al., 1968). 

In general, tokens should have the follow- 
ing properties: (a) Their value should be 
readily understood; (b) they should be easy 
to dispense; (c) they should be easily trans- 
portable from the place of dispensing to the 
area of exchange; (d) they should be identi- 
fiable as the property of a particular child; 
(e) they should require minimal bookkeep- 
ing duties for the teacher; (f) they should 
be dispensable in a manner which will divert 
as little attention as possible from academic 
matters; (g) they should have some relevance 
to real currency if one's desire is to teach 
mathematical or economic skills which will 
be functional outside the classroom; and (+) 
they should be dispensable frequently enough 
to insure proper shaping of desired behavior. 
Probably the most important consideration in 
the choice of tokens for different populations 
is the mental age of the child and the ease 
with which the child can comprehend various 
aspects of the token system. For example, 
with retarded children one may first have to 
establish the value of a token by repeatedly 
exchanging the token for a reinforcer such as 
candy. Also, a rating for a retarded child 
might have less significance than a number of 
stars, check marks, or plastic tokens which 
he can always see or retain in his possession. 
However, a rating placed on a removable 
sheet in a booklet on each child's desk is more 
readily administered by a teacher than plastic 
tokens, and the child probably would spend 
less time attending to or playing with the 
rating than with plastic tokens. Consequently, 
with children who can understand and re- 
member the significance of a rating, ratings 
would probably be preferable to plastic chips. 
Where stealing, playing with plastic tokens, 
or tearing up a rating sheet is a problem, one 
might even place a rating for each child in 
a clearly visible place in the front of the 
class. Check marks may be particularly suit- 
able for reinforcing academic behaviors, for 
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instance, number of answers correct or num- 
ber of items completed. Ratings are preferable 
to check marks for reinforcing social behaviors 
which are not as easily divided into discrete 
units. 
The backup reinforcers used in classroom 
token programs have varied greatly. Candy, 
small toys, and trinkets have been most com- 
mon, but in Cohen's (1967, see Footnote 3) 
program almost the entire environment was 
programmed to maximize the probability that 
academic behavior would be reinforced. 
Meals, social activities, clothing, toiletries, 
books, magazines, dormitory rooms, and even 
the quality of one's bed served as backup re- 
inforcers for academic achievement. Although 
there is an imperfect correlation between what 
a child says he likes and what in fact will 
reinforce his behavior, the senior author has 
found that simply asking a child what he will 
work for provides a very good basis for 
selection of backup reinforcers. Most im- 
portantly, having a wide variety of backup 
reinforcers makes it probable that at least 
one of the items will always serve as a rein- 
forcer for a particular child. The effectiveness 
of a token program is partly the result of 
constant changes in the types of backup rein- 
forcers. The imaginativeness of the teacher 
and classroom consultant in selecting rein- 
forcers intrinsic to the classroom environment 
becomes increasingly important as the tangible 
backup reinforcers are gradually withdrawn. 
Response costs. The use of cost procedures 
involving loss of tokens, point loss, and fines 
in token programs has been reported anec- 
dotically since the inception of token pro- 
grams. However, the effect of such procedures 
has been documented only recently. In a 
study by Phillips (1968), response cost pro- 
cedures were clearly effective in reducing ag- 
gressive statements, punctuality, and saying 
"ain't" in a home-style rehabilitation setting. 
The usefulness of cost procedures in a class- 
room was demonstrated by McIntire, Jensen, 
and Davis? in an after-school program for 
elementary and junior-high-school boys, and 
Weiner (1962, 1969) has established its ef- 


9 McIntire, R. W., Jensen, J., & Davis, G. Control 
of disruptive classroom behavior with a token 
economy. Paper presented to Eastern Psychological 
Association, Washington, D. C., April 1968. 
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fectiveness in basic research studies. How- 
ever, comparisons of cost procedures and posi- 
tive reinforcement in classroom settings have 
not been made. Whether cost procedures have 
undesirable side effects and whether the con- 
tingency manager becomes increasingly re- | 
sponsive to undesirable behavior and ulti- 
mately more punitive are questions which de- 
serve special attention. 

Group versus individual contingencie 
Tokens may be made contingent upon indi- 
vidual or class behavior. Unfortunately, evi- | 
dence concerning the comparative effectiveness 
of these two types of contingencies is absent. 
Schmidt and Ulrich (1969) demonstrated the 
effectiveness of reinforcing a class with extra 
recess when the noise level of the entire class | 
was below a certain level. However, it seems 
that shaping of certain behaviors could be 
best accomplished with individual rather than 
group contingencies. 3 

Probably, combinations of group and indi- 
vidual contingencies could be used to produce 
the greatest behavioral change, but group 
contingencies must be initiated with caution 
because of (a) the possibility that a par- 
ticular child cannot perform the requisite be- | 
havior; (5) the resulting possibility of undue 
pressure on a particular individual; and (c) 
the possibility that one or two children may | 
find it reinforcing to subvert the program oF 
“beat the system.” 

Contingent versus noncontingent reinforce 
ment. Children sometimes receive tokens 
and/or backup reinforcers independent 9 
their level of appropriate behavior as a con- | 
trol procedure. When token and backup rein 
forcers are not made contingent upon àP* 
propriate behavior, the appropriate behaviof 
deteriorates (Burchard, 1967), Furthermore: 
studies with individual children have show’ 
that when differential reinforcement of 7€ 
sponses other than the target behavior (DRO) " 
is instituted after a period of reinforceme” 
of the target behavior, the frequency of t 
target behavior is greatly reduced (Ba m 
Peterson, & Sherman, 1967). In short, per 
studies using both “free” reinforcers and D 
controls, one can imply that the addition | 2 
some of the “good things in life” is not E je 
ficient to increase appropriate behavior; 
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good things" must be contingent upon “good” 
behavior to increase its frequency. 


METHODOLOGY 
Experimental Design 


Most demonstrations of the effects of token 
reinforcement programs have utilized within- 
subject designs. Following a base or pre- 
experimental period, the token program is 
instituted, withdrawn, and finally reinstated 
(ABAB design; base, token, base, token). If 
there is a decrease in disruptive behavior 
each time the token program is in effect and 
a decrease when the program is withdrawn, 
it is usually concluded that the token pro- 
gram is effective in reducing disruptive be- 
havior. However, this ABAB design or vari- 
ants thereof is effective in determining the 
functional relationship between the token 
Program and the children’s behavior only if 
Other reinforcers such as teacher praise do not 
rapidly become effective in reducing disruptive 
behavior, If the token program is designed 
to increase the effectiveness of reinforcers 
Other than the token and backup reinforcers 
as quickly as possible, an early withdrawal 
9f the token program may be necessary to 
demonstrate that the program has a functional 
Telation to the children's behavior. Without 
Such a withdrawal, the children may continue 
lo show little disruptive behavior even when 
the token program is withdrawn if teacher 
attention and some extra privileges are used 
to reinforce appropriate behavior. Most im- 
Portantly, whether one's design emphasizes 
Within. or between-subject controls, since a 
teacher's behavior can be so powerful on a 
Minute-to-minute basis in the interim between 

* administration of the token reinforcers, 
€acher behavior should be carefully specified 
and/or controlled. Where there is a rapid 
Tansition from backup reinforcers to the usual 
Classroom reinforcers such as teacher atten- 
“On and privileges, comparisons with children 
el classes where alternative conditions 

Specified may be helpful and sometimes 
®cessary to show the effect of the token 
Program, 

pe tate problem with poesi ee a 
recive Stic settings is the impos e 

return to the earlier or base condi 


tions after an experimental program has been 
effected. For example, after a base line, one 
may introduce the use of frequent attention 
contingent upon prosocial behavior. The in- 
vestigator may then ask the teacher to with- 
draw attention for prosocial behavior or to 
return to her former rate of attention for pro- 
social behavior. Although such designs have 
apparently been used successfully, a reversal 
or return to an original condition even when 
carefully monitored (and most are not) may 
be quite different from the original condition. 
For example, after a teacher has successfully 
implemented a praise condition she may be 
very reluctant to return to a procedure which 
was ineffective for her. In fact, she may have 
forgotten how she behaved formerly. Where 
reversals are impossible or clinically unjusti- 
fied, multiple base-line procedures, sequential 
introduction of variables such as classroom 
rules and praise, replications with other chil- 
dren and other teachers, and control groups 
may be useful alternatives to a return to 
earlier conditions. Although the single-subject 
design has many advantages (Sidman, 1960) 
and has helped people focus on variables which 
have practical import (Bijou, 1965), several 
basic research studies (Grice & Hunter, 1964; 
Willis, 1969) demonstrated that the type of 
experimental design, namely, within subjects 
versus between subjects, interacts strongly 
with reinforcement schedules, and thus one 
might reach different conclusions using the 
same parameters but in different designs. In 
summary, both designs have their merits and 
demerits, and one should be attuned to the 
possibility of different results if his variables 
were manipulated in an alternative design. 
Another problem with reversals in a within- 
subject design in a token study is that de- 
mand characteristics may greatly influence 
one's data. Consider the experiment by Bu- 
shell et al. (1969). “Above average" pre- 
school children were in a token program in 
which they earned tokens exchangeable for 
special events such as a party or a movie. 
Then the children were given “free” tokens 
in the morning as they arrived at school. 
Finally, the children were again placed on a 
token system in which tokens were made con- 
tingent on good behavior. It seems highly 
possible that unless a teacher tells her children 


she wants and/or expects her children to be- 
have just as well in the free-token phase as 
they did during the contingent-token phase 
(O'Leary et al., 1969) that the children may 
pick up subtle cues from the rationale given 
them when the receive "free" tokens that they 
should misbehave. 

There are no hard rules for determining 
lengths of conditions in within-subject de- 
signs, but more emphasis should be placed on 
predetermined lengths of conditions and/or 
variability not exceeding a certain figure, in- 
stead of simply changing conditions for rea- 
sons not made explicit. For example, consider 
a base condition with percentage of disruptive 
behavior as follows: 75, 70, 65, 60, 75. Al- 
though such figures do not reveal a continuing 
downward trend, it is possible that additional 
observations would have indicated such a 
trend which seemed to be occurring during the 
first 4 days of observation. Since increasing or 
decreasing trends and absences of children 
may sometimes preclude precise predetermina- 
tion of length of conditions, variability of be- 
havior not exceeding a certain figure may be 
a useful guide in deciding when to institute 
a new procedure. The criterion of stability 
has been emphasized by many investigators, 
but few if any experimenters have specified 
any bound or limits for the variability they 
will tolerate before changing procedures. 

Other variables which should receive con- 
sideration both in the design and analysis of 
any classroom study are attendance, vaca- 
tions, introduction of new students, timing 
of recess, parent conferences, change of seat, 
and constancy of presentation of educational 
subject matter. All of the above variables 
can critically influence the level of a child's 
disruptive behavior, and particular attention 
should be placed on constancy of the seating 
pattern and the educational material. The 
present authors have found it very useful to 
ask teachers at the beginning of a study 
about all the classroom variables that might 
influence particular children under observa- 
tion. Usually teachers will mention many of 
the above variables, and then they are asked 
to hold such variables constant in order to 
assess the influence of a particular experi- 
mental variable. Having the teachers suggest 
variables to be held constant has made it 
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much easier to obtain control of such vari- | 
ables than simply telling a teacher to control À, 
such factors at the outset of the study, and 
early discussion of some of the basic con- 
siderations of experimental design with the 
teachers can preclude naive but unfortunate 
change of seating patterns or subject matter. 


Dependent. Measures 


Unfortunately, the dependent measures used 
in token programs have seldom been used in 
more than one laboratory. Although there 
have been a number of systematic replications 
with the same dependent measures by Madsen 
et al. (1968), concerning contingent praise in 
the classroom, and by O'Leary et al. (1969), 
concerning token programs, replication of 
these effects by different investigators is 
needed if only because of the problems of ob- 
server and experimenter bias. The extent tO 
which observer bias influences dependent mea- 
sures in field-experimental settings has not 
been investigated systematically, but Scott, 
Burton, and Yarrow (1967) reported that | 
“knowing the hypothesis of [an] experiment 
biased the observations of the informed ob- 
server in the direction of this hypothesis [D- - 
57]." Until more data pertinent to observer | 
bias are available, it seems best to delineate 
the coding categories very clearly, inform ob- 
servers of the bias phenomenon, and ask then? 
to be as honest as possible. Furthermore, if 2 
disguised videotape recorder can be used, pre 
and posttreatment sessions can be recorded, 
the tapes can be edited to remove obvious 
demand characteristics, and then observers 
can look at the tapes in a randomly presente! 
order.!^ 


Schedule of Token and Backup Reinforcement 


Schedules of reinforcement have had €% 
tremely powerful effects on the behavior ° 
rats and pigeons (Ferster & Skinner, 1957): 
Where bar-pressing is used as a depen! en 
measure, patterns of children's responses B 
resemble the patterns of responding in in a 
e In" 


relative to 
laborious 


the “real” treatment 

and time-consuming process © pe 
edited, random presentations of videotapes ™ 
unnecessary. 
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humans. For example, in children and iníra- 
humans, a fixed ratio schedule is often as- 
Sociated with pauses after reinforcement and 
later changes to high terminal rates. Similarly, 
variable ratio and variable interval schedules 
used with children often produce behavior 
Which resembles that of infrahumans (Long, 
Hammack, May, & Campbell, 1958; Orlando 
& Bijou, 1960); variable ratio schedules pro- 
duce very high and steady rates of respond- 
ing. Variable interval schedules also produce 
Steady rates of responding, but the rates are 
typically lower than those produced by vari- 
able ratio schedules. However, fixed interval 
Schedules with children and adults do not al- 
ways mimic the typical schedule control dem- 
onstrated with animals who show scalloping 
and pauses after reinforcement. (Orlando & 
Bijou, 1960; Weiner, 1969). No strict com- 
parison of classroom dispensing of token and 
backup reinforcement to classical studies of 
Schedules of reinforcement can be made since 
few contingency operations in the classroom 
meet classical scheduling definitions. For ex- 
ample, under a fixed interval schedule, rein- 
forcement is made contingent upon the first 
response after a fixed interval of time has 
Dassed since a previous reinforcement. Re- 
sponses before that time has passed have no 
effect upon the occurrence of reinforcement 
(Weiner, 1969). O’Leary et al. (1969) utilized 
à rating procedure whereby the teacher would 
give ratings to children (exchangeable for 
backup reinforcers) at approximately fixed 
times just after the natural lesson break. It 
Should be emphasized, however, that although 
ratings occurred. at fixed times, these token 
reinforcers were not always made contingent 
upon the first appropriate response after a 
fixed interval of time had passed since the 
Previous reinforcement. Those investigators 
Who have blithely described various “schedul- 
ing" operations in their token programs may 
have fallen into the trap which Breger and 
McGaugh (1965) so heavily criticized, 
namely, attempting to associate their work 
With the prestigious field of learning and 
Utilizing scientific sounding terminology 1n 
Order to make their work appear scientifically 
respectable, Distinctions about schedule speci- 
ications should not be taken lightly, for even 
Under tightly controlled conditions some 
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scheduling effects with humans are not well 
understood, and the failure of "scheduling" 
effects in field-experimental settings to mimic 
laboratory results may be due partly to im- 
proper analogue use. The absence of any 
tightly controlled scheduling research in field- 
experimental settings may reflect not only 
methodological difficulties in meeting precise 
schedule execution but also the rationale that 
schedule control from token and backup rein- 
forcement is mitigated by numerous other 
reinforcers provided by teachers and peers. 


GENERALIZATION 


One of the most frequently asked questions 
of behavior modifiers concerns the extent to 
which improved behavior generalizes. The 
question of generalization is critically im- 
portant to those who are considering the de- 
velopment of token programs and who must 
concern themselves with the overall effect of 
the program and its general clinical ap- 
plicability. However, the question Does be- 
havior generalize? may be well-intentioned 
but naive. In fact, considering the manner in 
which most token programs have been con- 
ducted, generalization of appropriate behavior 
should not have been expected. In short, gen- 
eralization is not seen here as a magical pro- 
cedure or an explanation of a phenomenon 
but a description of a behavioral change 
which must be programmed like any other be- 
havioral change. As Baer, Wolf, and Risley 
(1968) aptly emphasized, “Generalization 
should be programmed rather than expected 
or lamented [p. 97].” Since children do not 
spontaneously acquire self-control techniques 
as a function of their exposure to the token 
program, since children’s behavior seems to be 
quite situation-specific, and since children's 
natural environments outside the token class- 
room do not ordinarily reinforce the children's 
appropriate behavior in a systematic manner, 
when the token program is removed it is 
likely that the children's appropriate behavior 
will decline. In order to document this posi- 
tion, an examination of some token programs 
where generalization was not programmed 
should be helpful. 

The answer to the question Does behavior 
generalize? depends upon whether one is in- 
terested in generalization across behaviors 


(same or different behavior), situations (same 
situation or different situation), time (present 
or future), or combinations thereof. Ob- 
viously when one considers the possible ques- 
tions one could ask concerning these three 
types of generalization and their logical com- 
binations, answers become extremely complex. 
Since few investigators in the token reinforce- 
ment area have directed their research at 
evaluation of generalization, let us consider 
only two types of generalization which seem 
critical in assessing the effects of token pro- 
grams and for which there is some evidence. 


Across Situations 


Those investigators who have assessed the 
generalization of behaviors reinforced in the 
token programs to those same behaviors when 
the token program is not in effect, in different 
situations and at different times, have not 
found generalization (Kuypers et al., 1968; 
Meichenbaum et al, 1968; O'Leary et al., 
1969). During the first few days of the 
O'Leary et al. (1969) token program there 
may have been some slight generalization 
Írom behaviors reinforced in the token pro- 
gram in the afternoon to identical behaviors 
in the nontoken period the following morning, 
but presumably as soon as the children real- 
ized that their *good" behavior did not pay 
off in the morning, such behavior extinguished. 
It should be emphasized that one might more 
easily observe generalization across situations 
with young children if a token program were 
in operation in the morning and generalization 
measures were obtained the afternoon of the 
same day, rather than 21 hours later, as was 
the case in the studies by Kuypers et al. 
(1968) and O'Leary et al. (1969). 

Generalization probably varies inversely 
with the specification of the reinforcement 
contingencies. Wolf et al. (1968) made rein- 
forcement contingent upon one type of be- 
havior and then made reinforcement con- 
tingent upon another behavior. The specifica- 
tion of contingencies was made clear to the 
children by informing them about the number 
of points they could obtain for correct answers 
(a) whenever the number of points were 


11 When one has a token program for only one 
s. one also must consider generalization 


child in a cla 
across subjects. 
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changed, (b) when a student inquired about 
the number of points he could earn, and (c) 
when the student completed an assignment. 
The children's behavior rose and declined 
very abruptly with changes in reinforcement. 
In fact, when one boy's points for reading 
were decreased, his reading rate fell almost 
to zero in one day. In contrast, where it is 
not immediately clear to the children that 
their behavior will no longer receive rein- 
forcement, their behavior would probably -> 
show generalization across situations, at least 
for a short time. 


Across Time 


The within-subject design often used by 
operantly oriented behavior modifiers does 
not lend itself easily to assessment of long- 
term effects of token programs. For example, 
data obtained after withdrawal of a long 
token program cannot be compared to base- 
line data alone in making any firm conclusions 
about the effects of a token program since 
maturation, time in class, or other variables 
might have influenced such a change. Con- 
sequently, unless there are multiple base lines, 
no-treatment controls, or placebo controls, 
"follow-up" or “postcheck” data must be 
evaluated with caution. At present, no study 
presents evidence that when a child is re- 
turned to his regular class following a long- 
term token program, behavior is maintained 
at a rate greater than one would expect from 
control subjects. However, children who re- 
main in the token classroom with the same 
teacher after the token program has been re- 
moved, seem to show some generalization of 
appropriate behavior (Hewett, et al., 1969). 

Walker et al. (1969) reported in a footnote 
that their pilot data collected on two groups 
of children at 3- and 6-month intervals fol- 
lowing treatment in a token program sug- 
gested that treatment gains on the variable 
of task-oriented behavior did not mainta!? 
when the children returned to their regula” 
classrooms the following year. They reporte : 
At the start of the next school year, the project 
staff received requests from the school district 
"do something" about the behavior of five out 2a 
eleven subjects who had received treatment in 
token economy the previous year [p. 521. 


i wW 
A follow-up questionnaire sent to the ne 
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teachers of 16 of the 17 children that had 
been in the token program of O'Leary and 
Becker (1967) revealed that in November, 
11 of the 16 children did not exhibit unde- 
sirable social behavior such as talking back, 
temper tantrums, and refusing to follow in- 
structions. The remaining five children (three 
of the seven children systematically observed) 
were described variously as aggressive, stub- 
born, devilish, and talkative. Three children 
who were systematically observed in Novem- 
ber had an average of 36% deviant behavior 
as compared with an average of 79% during 
the base period of the token program in 
March. In short, without any instructions or 
advice to the new teachers of the target chil- 
dren, there was markedly less disruptive be- 
havior than before the onset of the token 
program the year before, but clearly not as 
little as there had been during the token 
program. In token reinforcement studies of 
Walker et al. (1969), generalization across 
time and settings was obtained. More specif- 
ically, data were obtained (a) at a time of 
day during the token program when the stu- 
dents reported to the regular classes (non- 
token), (5) when the students were in the 
classroom where the token program has been 
instituted but after tokens had been removed, 
and (c) in the child’s regular (nontreatment) 
Class 3 months after termination of treat- 
ment. Although there were no attempts to re- 
program the regular classroom environment 
to facilitate generalization to the child's regu- 
lar class while the token program was in op- 
eration, they reported that by the end of the 
fourth week of treatment, the children's task- 
oriented behavior was at a Very high rate and 
indistinguishable in the two settings. Similarly, 
they found that after a group of children 
had been in a token program for some time, 
the withdrawal of the token program had very 
little effect on the children’s behavior. How- 
ever, follow-up data taken 3 months after the 
children had been in a token program indi- 
cating that “none of the subjects’ posttreat- 
ment behavior maintained as efficiently as 
their behavior during treatment [p. 72 |.” The 
ehavior of the children in the regular class 
ranged from 39% to 97% of the level of 
appropriate behavior that they displayed in 
the treatment class. Despite these generally 
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impressive results, it is not at all clear what 
led to the generalization that was reported. 
After the major treatment program, specific 
programs tailored to each individual subject’s 
behavior were written for the regular class- 
room teachers to follow in order that task- 
appropriate behaviors could be maintained, 
and these programs may have aided some 
long-term behavior maintenance. In addition, 
Walker et al. suggested that the increased 
academic skill may have been a potent factor 
in the generalization obtained. However, the 
data of Walker et al. differ from most in- 
vestigators who failed to find generalization 
when they assessed the behavior of children in 
treatment and nontreatment settings simul- 
taneously, and results such as those of Walker 
et al. await replication. 


Suggestions for Achieving Generalization 


If, as Baer et al. (1968) emphasized, “gen- 
eralization should be programmed, rather than 
expected or lamented [p. 97],” in order to 
achieve maximal behavior generalization, a 
number of procedures must be implemented 
simultaneously. Simultaneous implementation 
of procedures precludes definitive research in 
the treatment setting which precisely isolates 
factors that enchance generalization. Analogue 
treatment research and/or basic research from 
areas such as education, experimental psy- 
chology, and social psychology need to be 
consulted for clues for enhancing generaliza- 
tion. Since there is so little data indicating 
that generalization can be obtained following 
a token program, the following suggestions are 
offered for achieving generalization: 


1. Provide a good academic program since 
in many cases you may be dealing with de- 
ficient academic repertoires—not "behavior 
disorders." If the child has the requisite aca- 
demic skills when he returns to his regular 
classroom, the probability of his engaging in 
disruptive behavior should be minimized 
(Walker et al., 1969). If he does not have 
the requisite academic skills, he will probably 
engage in disruptive behavior unless he has 
simply learned to sit *doing nothing." 

2. Give the child the expectation that he 
is capable of doing well by exaggerating ex- 
citement when the child succeeds and pointing 
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out that if he works hard he can succeed. In 
addition, children can be given the expecta- 
tion that they will be able to work without a 
token program as they grow older and more 
mature. 

3. Have the children aid in the selection of 
the behaviors to be reinforced, and as the 
program progresses have the children in- 
volved in the specification of contingencies— 
a procedure effectively used by Lovitt and 
Curtiss (1969). 

4. Teach the children to evaluate their own 
behavior. 

5. Teach the children that academic 
achievement will *pay off." For example, pick 
something you know a child likes to do, such 
as look at a comic book, and explain that he 
will be able to read the comics if he studies 
hard. 

6. Involve the parents. Largely for reasons 
of control, most token programs have not in- 
volved parents, but their use has been well- 
illustrated by McKenzie et al. (1968) and by 
Walker et al. (1969). 

7. Withdraw the token and backup rein- 
forcers gradually and utilize other “natural” 
reinforcers existing within the classroom set- 
ting, such as privileges (O'Leary et al., 1969; 
Osborne, 1969). 

8. Reinforce the children in a variety of 
situations and reduce the discrimination be- 
tween reinforced and nonreinforced situations. 

9. Prepare teachers in the regular class to 
praise and shape the children's behavior as 
they are phased back into the regular classes, 
and bolster the children's academic behavior— 
if needed—with tutoring by undergraduate or 
parent volunteers (Staats, Minke, Goodwin, & 
Landeen, 1967; Thomas, Nielsen, Kuypers, 
& Becker, 1968; Walker et al., 1969). 

10. Last, in order to maintain positive gains 
from a token program, it may help to look at 
the school system as a large-scale token sys- 
tem with the distribution of token and backup 
reinforcers extending from the school board 
to the superintendent, to the principal, to the 
teacher, and finally to the children. When 
viewed in such a manner, the consultant or 
research investigator should attempt to fa- 
cilitate the process of reinforcement not only 
the children but for the teachers, the 
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teacher from a principal, frequent feedback 
and follow-up results given to the principal, 
and some publicity about the program in local 
papers sent especially to school board mem- 
bers are but a few of the interactions which 
may serve to maintain interest in both the 
long- and short-term effects of token pro- 
grams. 


CauTIONARY NOTE 


Unfortunately, the generally powerful na- 
ture of token reinforcement programs has led 
many people to apply such programs hap- 
hazardly. However, as this review has em- 
phasized, the implementation of a token pro- 
gram is a very complex undertaking, and 
such programs should be executed only with 
proper consultation. The use of token pro- 
grams employing reinforcers such as candy, 
kites, and prizes not readily available to a 
teacher should be approached with particular 
caution. In fact, if a decision is made to im- 
plement a token program one should first at- 
tempt to utilize activities available to any 
teacher such as free time, extra recess, stories, 
and special privileges as backup reinforcers. 
Furthermore, emphasis on token and backup 
reinforcers should not make one myopic to 
the wide variety of other variables in a class- 
room which can be used to aid a teacher in 
achieving her objectives. A token program is 
but ome procedure—albeit a very powerful 
one—for improving classroom behavior. 
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The nature of similarity effects in short-term memory has been a focal point of 


investigations that attempt to discover 


fundamental similarities or differences 


between short-term memory and long-term memory. A conclusion common to 
many of these experiments is that phonemic similarity has a much larger effect 


than semantic similarity in short-term 


memory, while semantic similarity is 


more important in long-term memory, Typically, this fact is used to support 
the argument that short-term memory is phonemic in nature while long-term 


memory is basically semantic. 


A review of the literature on phonemic and 


semantic similarity in short-term memory revealed that semantic encoding is 
readily demonstrated in short-term-memory tasks, but only when the task 


requires it, or when slow 
for this feature of the data 


ates of incoming information are used. To account 
it was proposed that the encoding of information 


into short-term memory is a time-dependent process which must be traded off 


or time-shared w 


sumed to be faster than semantic encoding, hence leaves more ti 
itegy in a short-term-memory task. Thus, encoding 


hearsal, which is a useful stra 


will be primarily phonemic in short-term memory unless 


semantic encoding. 


A critical issue in the study of human verbal 
memory has been the nature of the relation- 
Ship between short- and long-term memory. A 
focal point of this issue is the utility of as- 
suming theoretically distinct information stor- 
Age systems in order to explain the character- 
istics of performance in short- and long-term 
memory experiments. The answer to this ques- 
tion is complicated by the fact that the the- 
Oretical states short-term store and 
term store cannot be examined directly. 
Stead, experimental procedures are used which 
differentially weight the expected contribution 
to performance of information in short-term 
and long-term store. Experiments that weight 
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vith rehearsal. The encoding of phonemic information is as- 


me for re- 


k demands require 


short-term store most heavily share certain 
operational characteristics, such as the use of 
short retention intervals and the requirement 
to retain a small amount of material presented 
once for a brief period of time. These pro- 
cedures have been used to define short-term 
memory operationally (cf. Melton, 1963), and 
arguments about the nature or usefulness of 
short-term store usually are based on simi- 
larities or differences between performance in 
operationally defined short-term memory and 
operationally defined long-term memory. 
The effect of similarity variables on long- 
term memory is a well established fact. For 
this reason the role of similarity in short-term 
memory has been the subject of a sizable 
number of experiments aimed at resolving 
questions about the nature of short-term store. 
A conclusion common to many of these experi- 
ments is that acoustic similarity has a much 
larger effect than semantic similarity in short- 
term. memory, while semantic similarity is 
more important in long-term memory. While 
a large body of research has been generated 
on this question, no systematic review is 
available. The present paper constitutes such 
a review and presents an evaluation of the 
empirical findings with respect to theoretic 
issues in human verbal memory. It is argued 
that existing data on similarity effects con- 
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strain any theory of short-term-memory per- 
formance in important ways, but are not criti- 
cal for the question of whether short-term 
and long-term store constitute useful con- 
cepts. The basis for this conclusion is stated 
now and justified in later sections. 

Similarity is accorded a major role in both 
the interference theory of long-term memory 
(Keppel, 1968; Postman, 1961) and in as- 
sociative theories of transfer of learning (Os- 
good, 1949). The basic constructs of inter- 
ference theory are response competition, un- 
learning, and spontaneous recovery, which 
produce the phenomena referred to as proac- 
tive and retroactive interference. The amount 
of proactive or retroactive interference ob- 
tained in any experiment is postulated to be 
a function of stimulus and response similarity, 
in accordance with the empirical laws of 
transfer performance as stated by Osgood 
(1949) and Martin (1965). For these reasons, 
attempts to investigate the nature of short- 
term memory and short-term store have in 
large part consisted of experiments on simi- 
larity-based interference effects such as pro- 
active and retroactive interference. In the 
literature that is reviewed here, it has com- 
monly been assumed that the effects of a 
given type of similarity (e.g., acoustic or 
semantic) must be the same in short-term 
and long-term memory in order for a theory 
using a single storage system to explain both 
types of behavior. The argument that is de- 
veloped in the present paper is that differ- 
ences in the quality of traces in short-term- 
memory and long-term-memory experiments 
may be explained by the operation of a time- 
dependent encoding process whose nature 
does not depend at all on the assumptions 
made about the information storage systems 
involved. 

In the following sections the effects of 
several kinds of similarity on retention are 
reviewed. First, it is useful to clearly define 
these dimensions. Many studies have in- 
vestigated the role of acoustic similarity in 
short-term memory. The term acoustic appears 
to be a misnomer since there is some reason 
to believe that the salient dimension may be 
articulatory similarity  (Hintzman, 1965, 
1967). In order to avoid prejudging this issue 
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(Wickelgren, 1965c) is used. Several similar- 
ity dimensions related to linguistic or semantic 
properties have been studied in short-term 
and long-term memory. The use of words be- 
longing to different superordinate categories 
(wolf versus apple) and the use of materials 
drawn írom different formal classes (con- 
sonants versus digits) are common manipula- 
tions in short-term-memory experiments. Both 
are referred to as conceptual similarity, since 
their effects seem entirely equivalent. Other 
similarity dimensions are described as needed. 


PHONEMIC SIMILARITY Er 


ECTS 


Phonemic Similarity Effects in Immediate 
Recall 


Phonemic similarity has been studied by 
several short-term-memory methods, but most 
frequently in the immediate recall paradigm. 
Generally, increased phonemic similarity 
among the elements of a string of items has 
the effect of decreasing the number of ele- 
ments correctly recalled in the correct posi- 
tion. Conrad (1959) categorized the errors 
made in this paradigm into transpositions, 
omissions, and substitutions and has shown 
that substitution errors made after visual 
presentation of consonants correlate highly 
with confusion errors made in listening to the 
same materials embedded in noise (Conrad, 
1964). In two briefly reported experiments 
(Conrad, 1963) the size of the vocabulary 
from which stimulus sequences were con- 
structed was a much less critical determinant 
of recall than the number of phonemically 
confusable elements in the vocabulary. These 
experiments used word strings instead of con- 
sonants and involved the sequential visual 
presentation of five-word strings at a 1-secon 
rate. Further procedural details were not re- 
ported, but Conrad and Hull (1964) reported 
a similar experiment, using letter sequences: 
in which vocabulary size (three versus nine 
letters) and phonemic similarity within yor 
cabularies were varied factorially. In orde! 
to equate element availability, the vocabulary 
used on each trial was displayed on the reca 
sheet. Phonemic similarity was shown to be 
a more powerful determinant of recall tha” 
sheer vocabulary size. This was considered 4^ 
important finding because Brown (1959) a” 
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others had proposed information theoretic 
models of short-term store which placed a 
great emphasis on vocabulary size. Conrad, 
Freeman, and Hull (1965) compared the ef- 
fects of phonemic confusability and sequential 
redundancy (digram frequency) on the im- 
mediate recall of visually presented six-con- 
Sonant strings. Phonemic similarity was by 
far the more powerful variable, which was 
taken to indicate the relative unimportance 
of a linguistic factor (digram frequency) in 
Short-term memory. 

The immediate recall paradigm has also 
been used to study phonemic similarity ef- 
fects using aural presentation. Wickelgren 
(19652) required subjects to copy mixed 
Strings of letters and digits during presenta- 
tion, in order to eliminate listening errors from 
the data. An extensive error analysis showed 
that substitution errors were phonemically re- 
lated to the forgotten item, and the likelihood 
of forgetting a letter was an increasing func- 
tion of the number of other phonemically 
Similar letters in a sequence. In another ex- 
Deriment (Wickelgren, 1965c) poorer ordered 
recall was shown for strings of phonemically 
Similar consonant-vowel digrams than for 
Strings of dissimilar materials. When recall 
Was scored without respect for order, phonemi- 
cally similar strings were better retained than 
dissimilar strings, indicating that phonemic 
Similarity increases the availability of ele- 
Ments, while decreasing the ability to repro- 
duce their order of arrival. . 

The nature of phonemic confusions in im- 
mediate recall has been studied in detail by 
Wickelgren (1965b, 1965d, 1966a) using aural 
Presentation at relatively fast presentation 
rates, When sequences of consonant-vowel di- 
Srams were used (Wickelgren, 1965d), the 
Most frequent intrusion errors were for di- 
rams constructed of the same two phonemes 
' different orders. Intrusions based on vowel 
Similarities did not depend on the position of 

* common vowel in two digrams, but con- 
Sonant similarity between two digrams was 
2 Breater source of error when the shared con- 
Sonant was in the same position in both 

'gtams. This led Wickelgren to suggest that 
s encoding of a digram consists of a vowel 

Presentation and a combined consonant plus 

Sition representation. Two further studies 


(Wickelgren, 1965b, 1966a) showed that the 
systematic errors occurring in immediate 
memory for vowels or consonants are well 
predicted by a distinctive feature system with 
three dimensions: voicing, nasality, and open- 
ness of the vocal tract. These may be either 
acoustic or articulatory codes, and Wickel- 
gren’s data were not sufficient to choose be- 
tween these alternatives. 

Hintzman (1965, 1967) has reported two 
experiments aimed at distinguishing between 
acoustic and articulatory coding. Both studies 
used visual presentation and the immediate 
recall technique. In the first, stimulus se- 
quences were constructed using phonemically 
similar pairs of consonants and digits (e.g., 
Q-2, T-3). Errors within and between con- 
ceptual classes of materials were systemati- 
cally based on phonemic similarity, and, in 
the few cases where two items were not 
acoustically similar but did share articulatory 
features, confusions were more frequent than 
chance. Hintzman reported that this was also 
true in Conrad's (1964) analysis of phonemic 
confusions. Another interesting result was that 
intraclass confusions occurred with greater 
than chance frequency, indicating that the 
conceptual nature of materials is also repre- 
sented in short-term memory. Hintzman’s 
second experiment (1967) was based on the 
fact that errors made in listening to letters 
embedded in noise are consistently correlated 
with similarity on the dimension of voicing 
but not with place of articulation. Hintzman 
assumed that performance on such listening 
tests is based on an acoustic representation of 
the stimulus. Hence, if phonemic confusions 
in short-term memory are also based on an 
acoustic code, confusions based on similarity 
on the place of articulation dimension should 
not occur. If articulatory coding is used in 
short-term memory, these confusions should 
occur, and in fact they did in Hintzman’s ex- 
periment with greater than chance frequency. 
Hintzman took these data to support the 
hypothesis of articulatory coding in short- 
term memory and, by implication, in short- 
term store. However, Wickelgren (1969) 
pointed out that certain of Hintzman’s as- 
sumptions are questionable. In particular he 
noted that whether or not performance on a 
listening test is based on an acoustic code is 
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in itself a currently debated point, the al- 
ternative position being that speech percep- 
tion is based on articulatory coding. The fact 
that errors based on place of articulation are 
relatively infrequent on listening tests is at- 
tributed by Wickelgren to a differential sensi- 
tivity to the masking effects of noise between 
the voicing and place of articulation dimen- 
sions. This accounts for the difference in er- 
ror patterns between listening tests and Hintz- 
man’s short-term-memory test where no mask- 
ing noise was used, and leads one to interpret 
Hintzman’s data as supporting the hypothesis 
that similar codes are used in both paradigms, 
Whether these codes are acoustic or articula- 
tory, however, remains a moot point. 
Experiments using the immediate recall 
technique have established the salience of 
phonemic similarity in short-term memory, 
and provide useful information about the en- 
coding process involved. It is clear that intra- 
unit phonemic similarity causes a decrease in 
the recall of items in their correct order, and 
there is some evidence (Wickelgren, 1965c) 
that the availability of items is enhanced by 
phonemic similarity. The latter finding is 
apparently not due to an increased guessing 
probability since the manipulation of similar- 
ity in this experiment was accomplished by 
varying the number of consonant-vowel di- 
grams that shared the same vowel rather than 
by varying the phonemic similarity of the 
consonants themselves. Since recall was for 
consonants only, comparisons between simi- 
larity conditions were not confounded with 
the phonemic similarity of the set of con- 
sonants used. Because other experiments either 
provided the vocabulary of items at the time 
of recall or used the same vocabulary on re- 
peated trials, little confirming evidence for 
the finding of increased item availability as 
a function of similarity is available. The im- 
mediate recall method provides no evidence 
about the time course of forgetting and makes 
it difficult to separate the effects of processes 
occurring during storage and retrieval. A rela- 
tively small number of studies using other 
methods do provide information of this sort. 
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Phonemic Similarity Effects in Delayed 
Recall 


The time course of short-term forgetting 
has most frequently been studied with the 
method used by Brown (1958) and Peterson 
(Peterson & Peterson, 1959). Murdock (1967) 
has named this the distractor technique since 
the activity used to fill the retention interval 
is intended to distract the subject from re- 
hearsal and is not itself to be remembered. 

A comparison of forgetting rates for pho- 
nemically similar and dissimilar words was 
attempted by Baddeley (1968) using the dis- 
tractor method. In order to make such a 
comparison it is necessary to equate immedi- 
ate retention in the various similarity condi- 
tions, but increased phonemic similarity is 
known to depress immediate recall with re- 
spect to an equally long control string of 
items. In order to deal with this problem 
Baddeley used phonemically similar strings of 
three words and control strings of five words. 
Immediate retention in both cases was about 
75%, scored as correct words in the correct 
position, and the forgetting rates over reten- 
tion intervals between 2 and 16 seconds did 
not differ for phonemically similar and dis- 
similar lists. This comparison is difficult to 
evaluate because of the differences in length 
between similar and dissimilar strings and the 
consequent ambiguity of percentage correct 
as a dependent variable. In terms of per- 
centages, immediate recall was equated, but 
in terms of the number of words recalled in 
the correct position, control strings were re- 
called better. Similarly, there was more and 
faster forgetting for dissimilar strings whe? 
number of correct words recalled was used 
as the dependent variable. The interpreta- 
tion of Baddeley’s result seems a moot point 
because of these ambiguities. A more usefu 
method for investigating the relative rates 
of forgetting of various kinds of informatio? 
is the probe method as used by Bregma? 
(1968). In this experiment subjects viewe 
long list of words intermixed with test nen 
consisting of rhymes, graphic cues, and CO" 
ceptual cues. The test items served as e 
probes. For example, if “rose” was a stimu, 
word the probe might be “sounds like %03% 
"d$ a flower,” or “_.— is spelé 
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70 — —" The rate of forgetting of these three 
types of information was similar over reten- 
tion intervals ranging from 3 to 288 seconds, 
although there were significant differences at 
the longer intervals. 

In one of the only studies investigating 
Several degrees of similarity in short-term 
memory, Wickelgren (1966b) used the dis- 
tractor method to investigate short-term pro- 
active and retroactive interference as a func- 
tion of phonemic similarity. The task was to 
retain a single "critical" consonant preceded 
by 0, 4, 8, or 16 other (distractor) consonants 
and followed by O, 4, 8, or 16 others. The 
Proportion of distractor letters phonemically 
Similar to the critical item was either Q, .25; 
-50, .75, or 1.0, All letters were copied during 
Presentation, which was rapid (two letters per 
Second), aural, and sequential. The results 
Were complex but consistent in showing that 
both proactive and retroactive interference in- 
creased as a function of phonemic similarity. 
The number of distractor items was effective 
in increasing retroactive but not proactive in- 
terference. This is probably because increas- 
ing the number of retroactive distractor items 
also increases the retention interval. The re- 
sults imply a two-factor forgetting theory in 
which proactive interference is due simply to 
an increased number of similar alternatives in 
Storage, or response competition, and retro- 
active interference reflects both this factor and 
either trace decay or unlearning as in the in- 
terference theory of long-term memory. 

The hypothesis that proactive and retroac- 
tive interference are due to competition be- 
tween traces at the time of recall is a reason- 
able explanation of forgetting which is con- 
sistent with the similarity effects being dis- 
cussed. The loss of information that occurs as 
retention interval increases may reflect an 
increase in such competition caused either by 
the spontaneous recovery from unlearning of 
Prior traces or the decay over time of the to- 
be-remembered item. Conrad (1967) pointed 
Out that the decay and recovery-from-unlearn- 
Ig hypotheses make different predictions 
about the temporal course of phonemic con- 
Usions in short-term memory. On the spon- 
taneous recovery hypothesis, confusion errors 
Should be random at the beginning of a re- 
tention interval and become systematically 
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based on phonemic similarity as competing 
traces recover their strength later in the re- 
tention interval. On the decay hypothesis, 
nonrandom confusion errors should be found 
at short retention intervals, and randomness 
should appear as decay proceeds and distinc- 
tive features are lost. In both cases the 
strength of the to-be-remembered item rela- 
tive to its competitors will be lower at the 
longer of two retention intervals; hence, more 
errors should occur at longer retention in- 
tervals. The prediction, then, concerns the 
proportion of all errors that are systematically 
based on phonemic similarity. A simple dis- 
tractor experiment in which the retention of 
consonant quadrigrams was measured after 
either 2.4 or 7.2 seconds of number reading 
clearly supported the decay hypothesis. Sys- 
tematic phonemic confusions were only found 
at the 2.4-second retention interval. Conrad 
pointed out that the unlearning and recovery 
hypothesis was compatible with his data only 
under the assumptions that the degree of un- 
learning was inversely related to similarity 
between competing traces, and that the rate 
of recovery from unlearning was a direct func- 
tion of this variable. Under these assumptions 
phonemically similar aspects of an unlearned 
item might cause errors at an earlier point in 
a retention interval than dissimilar aspects of 
the same unlearned item. These assumptions, 
however, seem opposite to the spirit of inter- 
ference theory which generally includes the 
notions that both competition and unlearning 
are direct functions of similarity. To assume 
otherwise would, for example, disallow the 
common assumptions that retroactive inter- 
ference in the distractor paradigm is mini- 
mized by using dissimilar materials for the 
critical item and the distractor task, or that 
retroactive interference in long-term memory 
is less for the A-B, C-D paradigm than the 
A-B, A-C. Conrad's experiment seems im- 
portant and interesting since it provides what 
seems to be a converging operation for dis- 
criminating between the decay and unlearn- 
ing hypotheses. 

Phonemic similarity has also been investi- 
gated with paired-associate techniques. Bruce 
and Murdock (1968) varied the phonemic 
similarity between stimuli in a probe paired- 
associate short-term-memory task. Six word- 


404 ey 4 


pairs were presented visually at a 2-second 
rate, and a retention test was given immedi- 
ately after presentation of the last pair by 
presenting one of the stimuli alone. In each 
list, two pairs had phonemically similar stim- 
uli, differing by one distinctive feature in their 
first phoneme. There was a marked effect of 
recency and a significant effect of similarity 
at the longer retention intervals, but only 
when the second of the two phonemically 
similar pairs was tested. Thus, phonemic simi- 
larity caused proactive interference but not 
retroactive interference, a result which argues 
against any account of the retrieval process 
based on a stimulus simply eliciting its re- 
sponse. Such an associative hypothesis pre- 
dicts equal interference in Bruce and Mur- 
dock's proactive interference and retroactive 
interference conditions. If retrieval is viewed 
as a systematic search of short-term store 
(Yntema & Trask, 1963), then these data are 
more easily interpreted. Suppose first that 
errors arise when two similar stimuli have 
partially decayed and are not perfectly dis- 
criminable. This will produce a strong recency 
effect and a greater number of errors on pho- 
nemically similar items. Second, suppose the 
retrieval process consists of a series of com- 
parisons between the probe or test stimulus 
and the stored representations of stimuli, 
which starts with the first presented item and 
proceeds until a successful match occurs at 
which time the response associated with the 
matched stimulus is given. Since this search 
proceeds from the first presented item toward 
the most recent, the first presented of two 
similar pairs will always be encountered be- 
fore the second presented item. When the 
first pair is the tested item, it will be either 
recognized and recalled or an error will be 
committed before the search process reaches 
the second and similar item. Hence, no ad- 
ditional errors due to phonemic similarity will 
occur on a test of the first of two presented 
items (retroactive interference). On some 
proportion of the trials where the second 
presented pair is tested, a false recognition of 
the stimulus in the first presented pair will 
occur before the search reaches the second 
and similar item, causing errors that are in 
addition to those produced by failure to recog- 
nize and retrieve the second pair itself. Since 
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increased phonemic similarity will produce in- 
creased numbers of these false recognitions, 
there should be an effect of stimulus similarity 
on retention of the second of two presented 
pairs (proactive interference.), This explana- 
tion makes use of the ad hoc assumption 
that the search process starts at the beginning 
of the list, hence is not critically tested by 
the data. It is offered merely as an example 
of the possible explanatory usefulness of the 
search construct for information retrieval. In 
any event, Bruce and Murdock’s data are 
difficult to explain with classical associative 
models. 

The concept of retrieval as a process of 
searching short-term store and making a series 
of comparisons between whatever retrieval 
cues have been provided and stored informa- 
tion is also useful in accounting for similarity 
effects in recognition tasks. Wickelgren 
(1966c) studied retroactive interference in 
short-term recognition memory using a probe 
technique. The task involved the presentation 
of a single critical letter for retention fol- 
lowed by 12 interference letters varying in 
their similarity to the critical letter. The 
single probe letter was either correct, incor- 
rect and phonemically similar to the critical 
letter, or incorrect and dissimilar. In some 
conditions the critical letter was repeated as 
one of the interference letters, and when this 
was done there was a sizable negative effect 
of phonemic similarity on correct recognition 
of the critical letter. There was also an in- 
crease in the false-recognition rate for pho- 
nemically similar probes. In another study O 
Short-term recognition, Reicher, Ligon, an 
Conrad (1969) used both a two-alternative 
forced-choice procedure and a yes-no pro- 
cedure to examine memory for phonemicallY 
similar words. In both procedures there was 
evidence for large increases in false recogni- 
tion rates for phonemically similar materials- 


Phonemic Similarity Effects in Long-Term 
Memory 


The effects of phonemic similarity on long 
term memory have not been extensively p 
vestigated, but the results available t 
ternally consistent. Dallett (1966) report 
four experiments investigating the effect 
phonemic similarity between stimuli on 
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acquisition and retention of paired-associate 
lists. A 1-week retention interval was utilized, 
and phonemic similarity was varied both 
Within and between lists, Between-list pho- 
nemic similarity had little effect. Within-list 
Similarity retarded acquisition and depressed 
retention at the 1-week interval. The effect 
9n retention possibly is due to differences be- 
tween Similarity conditions in the terminal 
level of acquisition, although this is by no 
means obvious. However, the depression of 
acquisition itself indicates an effect of simi- 
larity on long-term store, since the time in- 
lerval between presentation of similar pairs 
Was not less than 8 seconds in the within-list 
Similarity condition, and probably averaged 
about 40 seconds (assuming randomized 
Orders of the 12 paired associates were used). 
Bruce and Murdock (1968, Experiment II) 
€xamined long-term memory for paired as- 
Sociates in a retroactive interference design. 
Again, the phonemic similarity of List 1 and 
List 2 stimuli affected neither acquisition nor 
Tetention. In a series of four experiments on 
Serial list learning, Baddeley (1966a) also 
found little effect of phonemic similarity on 
long-term memory. These data are consistent 
In the sense of showing no effect of phonemic 
Similarity when manipulated as a between- 
list variable in paired-associate tasks or as a 
Within-list variable in serial learning. Dallett’s 
(1966) experiment indicates that long-term 
Store is affected by phonemic similarity manip- 
ulated as a within-list variable in a paired- 
associate task. Phonemic attributes therefore 
Can be represented in long-term store, but the 
boundary conditions of this process are not 
Clear, 


Limits on PHONEMIC EFFECTS IN 
SHORT-TERM MEMORY 


Degree of Forgetting 


. The boundary conditions for phonemic sim- 
ilarity effects in short-term memory are just 
Slightly clearer than in long-term memory. 
Conrad's (1967) distractor experiment indi- 
Cates that phonemic confusions are most 
Prevalent at short nonzero retention intervals, 
When partially decayed short-term-store traces 
äre most likely to exist. Most of the evidence 
?" phonemic similarity effects in short-term 
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memory has been gathered using immediate 
recall techniques. The effective retention in- 
terval in an experiment of this type actually 
is on the order of a few seconds, since the 
first presented items must be stored during 
Subsequent presentations and the last pre- 
sented items are stored during recall of the 
first part of the sequence. Bruce and Mur- 
dock’s results (1968, Experiment I) showed a 
large effect of phonemic similarity at reten- 
tion intervals longer than either Conrad’s 
(1967) or any immediate recall experiment. 
This provides an interesting insight into the 
nature of retrieval processes in short-term 
memory. In a paired-associate experiment 
such as Bruce and Murdock's, the retrieval 
cue made available to the subject consists of 
a nominal replica of one of the stored stimu- 
lus items. In order to retrieve the response 
correctly in a paired-associate task, it is 
first necessary to correctly recognize the re- 
trieval cue as a match for a particular stimu- 
lus. Martin (1967) has clearly demonstrated 
the importance of this stimulus recognition 
process in paired-associate retention, by dem- 
onstrating that recall of the response member 
of a paired associate is no better than chance 
whenever the stimulus member is not cor- 
rectly recognized. Recall also can be con- 
ceived of as a process of matching retrieval 
cues with stored information, the difference 
between recall and recognition being largely 
in the number of retrieval cues provided at 
the time of test. Thus, in the distractor ex- 
periment very few retrieval cues are pro- 
vided, in an experiment like Bregman's (1968) 
more cues are provided, and in a yes-no recog- 
nition experiment the maximum amount of 
retrieval information is provided by presenta- 
tion of a nominal replica of the stored item 
itself. Failure to retrieve information cor- 
rectly indicates a failure of the matching 
process between retrieval cues and stored in- 
formation which may logically be due either 
to a loss of stored information or inadequate 
retrieval cues. An experiment such as Bruce 
and Murdock's, which provides a relatively 
complete set of retrieval cues, requires a 
ereater loss of stored information to produce 
the same number of erroneous matches and 
confusion errors than one such as Conrad's 
(1967) or an immediate recall task where 
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>s 
retrieval À- are few. Hence, there is a dif- 
ierence between the retention interval neces- 
sary to show effects of phonemic confusions 
in the probe paired-associate procedure and 
the distractor or immediate recall procedure. 


Amount of Time for Encoding 


Other boundary conditions for phonemic 
similarity effects are harder to establish. Con- 
rad, Baddeley, and Hull (1966) found no 
interaction of the phonemic confusion effect 
with presentation rate in an immediate recall 
task. In this experiment sequences with high 
internal phonemic similarity were compared 
to low-similarity control sequences. Many 
more errors were made on phonemically con- 
fusable lists, but there was no significant ef- 
fect of presentation rate on either type of 
sequence. Since the manipulation of presenta- 
tion rate theoretically may affect rehearsal, 
decay, or recovery from unlearning, and the 
time available to encode items distinctly and 
since these processes have opposite effects on 
trace strength, it is perhaps not surprising 
to find little net effect of rate. 

The hypothesis that the encoding process 
is affected by the manipulation of presenta- 
tion rate is important for an understanding of 
similarity effects in short-term memory. Sev- 
eral experiments in which either presentation 
rate or stimulus duration was varied provide 
evidence consistent with this time-dependent 
encoding hypothesis. Many of these experi- 
ments were recently reviewed by Aaronson 
(1967); hence, the present review focuses on 
those experiments that manipulate similarity 
variables as well as the amount of time avail- 
able for encoding. Laughery and Pincus 
(1966), in an immediate recall experiment 
using consonant strings, studied phonemic sim- 
ilarity effects as a function of presentation 
modality (visual versus aural), presentation 
rate (20, 60, or 180 items per minute), and 
sequence length (6 or 8 items). Ordered re- 
call varied inversely with presentation rate, 
inversely with length, and was superior with 
aural presentation, but only at the fastest 
rate. This last finding is consistent with the 
hypothesis that retrieval is largely on the 
basis of phonemic features which are difficult 
to extract from a rapidly presented visual 
stimulus. This explanation predicts a larger 
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effect of phonemic confusability on visually 
presented material at slower rates, an inter- 
action that did not occur in Laughery and 
Pincus’ experiment, Instead, there was no 
effect of phonemic similarity at the slowest 
presentation rate, possibly because of the 
compensating effect of rehearsal. An experi- 
ment by Eagle and Ortof (1967) reported an 
interaction between phonemic confusability 
and amount of time available to process the 
stimuli. A list of 26 words was presented 
aurally and tested with a forced-choice recog- 
nition procedure. The amount of time avail- 
able for stimulus encoding was manipulated 
by requiring subjects in one condition to per- 
form a digit-coding task during list presenta- 
tion. The effect of this time-sharing require- 
ment was to increase the incidence of pho- 
nemically based false recognitions. This indi- 
cates that encoding may be a serial process 
with phonemic features coded first and se- 
mantic features later. When processing time 
is limited, more reliance is placed on phonemic 
features as discriminanda between competing 
traces, Further discussion of this time-de- 
pendent encoding hypothesis is delayed until 
a review of semantic similarity effects has 
been given. 


Method of Presentation 


Finally, Adams, Thorsheim, and McIntyre 
(1969) were unable to find reliable effects of 
phonemic similarity on short-term recall with 
the distractor method when the materials 
(consonant bigrams, trigrams, and quadri- 
grams) were presented simultaneously and 
subjects were instructed to find natural lan- 
guage mediators as retrieval aids. In another 
experiment comparing simultaneous and se- 
quential visual presentation of the items in à 
consonant string, they found an effect O 
phonemic similarity on the immediate recal 
of sequentially presented consonants and on 
the delayed recall of simultaneously presente 
materials. The finding that phonemic coding 
is not used when instructions to use natura 
language mediators are given suggests tha 
subjects have some control over the way 1? 
which information is encoded. The interpret?" 
tion of the effects of sequential versus simu 
taneous presentation is unclear, and the neri 
for a better understanding of the condition 
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of presentation as boundary conditions for 
phonemic similarity effects obvious. 


SEMANTIC SIMILARITY EFFECTS 
Semantic Similarity in Immediate Recall 


The undeniable salience of phonemic simi- 
larity in short-term memory has been used as 
an argument against the theoretic similarity 
between short-term store and long-term store 
(cf. Conrad, 1967). In particular, the short- 
term trace is claimed to be primarily acoustic 
or articulatory while the long-term trace is 
primarily semantic. A sizable number of ex- 
periments have been reported which deal with 
the effects of semantic similarity in short-term 
memory, in order to disprove this claim. 
Whereas phonemic similarity has most fre- 
quently been studied using immediate recall 
techniques, the Brown-Peterson distractor 
method has been used almost exclusively in 
studies of semantic similarity in short-term 
memory, although there are important excep- 
tions, Since phonemic and semantic similarity 
have for the most part been studied using 
different short-term-memory methods, any 
general statement about the relative impor- 
tance of the two in short-term store requires 
careful scrutiny. The retention intervals most 
frequently used in studies of semantic simi- 
larity with the distractor technique are on 
the order of 10-20 seconds, which presents an 
ambiguous situation with respect to the rela- 
tive contributions to performance of short- 
term store and long-term store. Thus, results 
Obtained with this paradigm, which has been 
extremely useful in determining the functional 
Properties of performance in short-term mem- 
ory, are of questionable value for studying the 
properties of short-term store or for deter- 
mining whether short-term store is a useful 
concept at all. Immediate recall and probe 
methods are more useful for this purpose. 

, Unfortunately, the effects of semantic or 
linguistic factors on immediate recall have 
not been extensively studied. Conrad et al. 
(1965) failed to find very much effect of 
digram frequency on immediate recall. Bad- 
deley (1966b) compared phonemic and se- 
mantic similarity and found a small but 
Significant decrement in ordered recall for se- 
Mantically similar adjectives. Phonemic simi- 
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larity had a very large negative effect. Un- 
fortunately, there is the possibility that the 
effect of semantic similarity in this experiment 
was artifactually minimized. The vocabulary 
of words used was always provided at the 
time of recall because the experimenter's in- 
terest was in ordered recall and not item 
availability. A strategy available in the se- 
mantic similarity conditions was simply to 
memorize the first letter of each presented 
word, regenerating the words themselves at 
the time of recall. This was not possible in the 
phonemic similarity conditions since four 
words shared each first letter used. 

In an immediate recall experiment Harden 
(1929) varied conceptual similarity by ma- 
nipulating the similarity of the two halves of 
an eight-item sequence. The first four items 
were always consonants, and the second four 
were either 0, 1, 2, 3, or 4 digits and 4, 3, 2, 1, 
or O consonants. When four digits were used 
there was a marked increase in retention of 
the entire string, as compared to the other 
conditions which differed only slightly. This 
result thus provides only partial support for 
the hypothesis of an effect of conceptual simi- 
larity. Finally, Schwartz (1966) showed that 
immediate recall increased when the string 
of items presented was composed of blocks of 
items conceptually similar within but not be- 
tween blocks (e.g, four words plus four 
digits). The comparison was to an unblocked 
string of the same items. Blocking therefore 
facilitated conceptual encoding, which pre- 
sumably decreased the functional similarity 
between items and enhanced retention. 

The available evidence does not seem to 
justify the statement that short-term store, as 
measured by immediate recall, is unaffected 
by semantic similarity. The magnitude of 
semantic effects with this procedure does seem 
slight in comparison to the effects of phonemic 
similarity, but comparisons across dimensions 
are difficult since it is impossible to know 
when equal degrees of semantic and phonemic 
similarity have been achieved. 


Semantic Similarity Effects in the Distractor 


Paradigm 

Retroactive interference. It has already 
been noted that the distractor method is some- 
what ambiguous with respect to the role of 
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short-term store and long-term store. Still, re- 
sults obtained with this procedure are in- 
teresting and important for a theory of 
memory. Brown (1958) tested the retention 
of four consonant-pairs preceded or followed 
by interference pairs which were either con- 
sonants or digits. He found a reliable effect 
of the similarity of the interference material, 
and a given amount of interfering material 
prior to the critical four pairs caused less for- 
getting than the same amount presented after 
the critical pairs. In other words, retroactive 
interference was greater than proactive inter- 
ference. It is unclear whether the similarity 
variable manipulated here should be con- 
sidered conceptual similarity or phonemic sim- 
ilarity since consonants are more likely to be 
phonemically similar to each other than to 
digits. Corman and Wickens (1968) reported 
a somewhat similar result with respect to con- 
ceptual similarity. The recall of a consonant 
trigram was measured after a 10-second re- 
tention interval filled with the reading of 
either 12 digits or 10 digits and two con- 
sonants. The two consonants came either early 
or late in the recall period. There was sig- 
nificantly more retroactive interference when 
consonants were read than when only num- 
bers were read, and there was a nonsignificant 
tendency for consonants presented early to 
interfere more than consonants presented later 
in the retention interval. A third investigation 
of the role of similarity in retroactive inter- 
ference was reported by Dale and Gregory 
(1966). Both phonemic and conceptual simi- 
larity were studied in the retention of word 
trigrams over a 4-second retention interval. 
The interpolated material was either similar 
or dissimilar to the critical trigram, and the 
similarity dimension used was either con- 
ceptual or phonemic, resulting in four groups. 
Both types of similarity caused sizable decre- 
ments in retention, and the effect of phonemic 
similarity was slightly greater than conceptual 
similarity. Since this experiment used words, 
there is no question of conceptual similarity 
being confounded with phonemic similarity as 
is true for consonants and digits. Hence, it is 
reasonable to conclude that retroactive inter- 


ference in short-term memory does increase 
with conceptual similarity. 
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Proactive interference. There are several 
studies of short-term proactive interference 
in the extant literature. Brown's (1958) ex- 
periment has already been mentioned. Keppel 
and Underwood (1962) gave a clear demon- 
stration of the effect of prior material on re- 
tention in the distractor paradigm. No reten- 
tion loss at all was found on the first trial of 
their experiment, and there was an increase in 
the effect of retention interval from Trial 1 to 
Trials 2 and 3. The absence of forgetting on 
Trial 1 was also reported by Loess (1964) 
and by Cofer and Davidson (1968) under 
experimental conditions similar to those of 
Keppel and Underwood. The first trial result 
is sometimes cited as evidence against all 
models of short-term store which include a 
time-dependent decay process, but in fact it is 
possible to formulate decay assumptions that 
are not inconsistent with these data. In order 
to do this, it is necessary to include assump- 
tions about the role of similarity in short-term 
forgetting. Assume first that trace strength 
decays over time to a nonzero asymptote. 
Second, assume that recall of some critical 
item, x;, is a direct function of the strength 
of xa s(x;), at the time of attempted recall, 
and an inverse function of the strength and 
similarity of other recently presented items, 
collectively denoted x. This second assump- 
tion is equivalent to the assumption of re- 
sponse competition incorporated in the inter- 
ference theory of forgetting and may be for- 
malized as follows: 


s(x) 
PR, = 
s(xi) + È s) 
ki 
where P, is the probability of a correct recall, 
and the term X s(xx) represents the summed 
[nn 
strengths of those recently presented items 
similar to the critical item along at least one 
encoded dimension. In the absence of similar 
items the term > s(x;) drops out of the above 
ki 


expression; hence, performance will be perfect 
as long as s(%;) remains greater than Zero: 
Since the strength asymptote is assumed to be 
nonzero, no forgetting at all will be predicte 

in the complete absence of competition from 
similar items. In the experiments by Keppe 
and Underwood (1962), Loess (1964), an 
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Coler and Davidson (1968) the conditions on 
Trial 1 permit the assumption that s(x;) is 
negligible in comparison to s(x;) over the time 
intervals measured, since no practice items 
were given prior to Trial 1 and material of 
very low similarity to the critical item was 
used to distract the subject during the reten- 
tion interval. The absence of an effect of re- 
tention interval on the first trial of a dis- 
tractor experiment and the gradual increase 
in the retention interval effect from Trial 1 
to Trial 2, and from Trial 2 to Trial 3 is 
correctly interpreted as a buildup of proactive 
interference (Keppel & Underwood, 1962), 
but the finding of proactive interference in 
Short-term memory is not inconsistent with 
the assumption of autonomous, time-depen- 
dent trace decay. In order to explain simi- 
larity-based effects such as proactive inter- 
ference, it is necessary to assume some such 
mechanism as competition between traces in 
addition to either time-dependent decay, or 
recovery of prior items from unlearning, also 
a time-dependent mechanism. Since both types 
of theory account for similarity effects in 
Short-term memory, these effects shed little 
light on the issue of whether or not short- 
term and long-term store operate on the 
same principles. Both types of theory place 
great emphasis on the role of similarity as a 
parameter of performance in short-term mem- 
Ory tasks; hence, studies in which degree of 
similarity between items is manipulated are of 
Some interest. 

Release from proactive interference. Wick- 
ens, Born, and Allen (1963) demonstrated 
that the conceptual similarity of prior ma- 
terials is a major determinant of the proactive 
interference effect in short-term memory. In 
this experiment a series of distractor trials 
was given using either consonant or digit 
trigrams, recalled after an 11-second reten- 
tion interval during which the subject named 
colors; 3, 6, or 9 trials in succession used one 
class of materials, and on Trial 4, 7, 0r 10a 
trigram from the other class was presented. 
'The retention of the new class of material 
was markedly greater than the retention of 
trigrams drawn from the same class of ma- 
terials used on the first 3, 6, or 9 trials. 
Hofer (1965) explored this release from pro- 
active interference using retention intervals 
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of 5, 11, and 17 seconds, and switching on 
Trial 4 either from letter trigrams to word 
trigrams, or vice versa. The change from one 
class of materials to another caused an im- 
provement in performance and eliminated the 
significant effect of retention interval found 
in a control condition where no change of 
materials occurred. 

The phenomena of proactive interference 
and proactive interference release provide im- 
portant information about the encoding proc- 
ess in short-term memory. Loess (1967) used 
the distractor method with word triads as the 
critical items, to be retained over a 9-second 
retention interval. A triad was either homoge- 
neous or heterogeneous with respect to the 
conceptual category from which words were 
drawn. Eight conceptual categories were used, 
and the set of three homogeneous trials from 
each category was either presented on three 
successive trials (blocked) or was intermixed 
with triads from other categories. In the 
blocked conditions there was a larger proac- 
tive interference effect than in the series of 
intermixed trials, and a release from proactive 
interference occurred on the first trial of each 
new block. There are at least two possible ex- 
planations for the greater amount of interfer- 
ence found in blocked conditions. First, there 
is less time between the presentation of similar 
triads in blocked as compared to mixed trials. 
On the assumption of time-dependent trace 
decay it is possible that blocking results in 
stronger competing traces at the time of re- 
call. However, short-term recall tends to be 
asymptotic beyond 8 or 10 seconds, and the 
interval between successive presentations Or 
recalls exceeds this range even on blocked 
trials. It is therefore hard to see how compet- 
ing traces can be much reduced in strength 
by increasing the time between presentations 
of competing traces beyond the value used 
in Loess's blocked trials. The recovery-from- 
unlearning hypothesis is also inadequate be- 
cause unlearning does not occur until the 
time of presentation of the second of two 
similar stimuli. Hence, the interstimulus in- 
terval used is irrelevant to the degree of re- 
covery from unlearning. A hypothesis that 
will account for the greater men E. 
blocked conditions 15 that blocking is itates 
the encoding of conceptual similarity between 
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triads. To the extent that there is overlap be- 
tween the stored representations of successive 
triads, interference is increased and forgetting 
occurs. Release from interference occurs when 
successive triads are encoded with little over- 
lap, as in the case of switching from one block 
of conceptually similar triads to a triad with 
different conceptual properties. These specu- 
lations are supported by another experiment 
reported by Loess (1968). The blocking of 
trials using word triads drawn from the same 
conceptual categories produced more proac- 
tive interference than a mixed series of trials, 
and proactive interference release was ob- 
served on the first trial of each new block. 
Whatever the theoretic explanation is, the 
phenomena of proactive interference and re- 
lease from proactive interference are useful 
indicators of the way in which information is 
coded in short-term memory and possibly 
short-term store, A Series of experiments by 
Wickens and his associates has used this 
method to gauge the salience of several simi- 
larity dimensions in short-term memory. 
Wickens and Clark (1968) constructed tri- 
ads using words varying along three dimen- 
sions of the semantic differential (Osgood, 
Suci, & Tannenbaum, 1957). In a series of 
distractor trials with a 15-second retention 
interval, proactive interference and proactive 
interference release occurred as a function 
of the similarity of prior material along 
these dimensions. Wickens and Eckler (1968) 
used the proactive interference release tech- 
nique to study the effects of conceptual simi- 
larity under conditions of equal phonemic 
similarity. In a series of 11 distractor trials 
for both word and consonant trigrams, Trials 
8, 9, and 10 used consonant trigrams and the 
eleventh, critical, trigram was a consonant 
trigram (e.g., J-R-C) in one group and the 
word homophones of these consonants (JAY- 
ARE-SEA) in another group. The switch to 
words on Trial 11 resulted in a significant re- 
lease from proactive interference in com- 
parison to the group that had a consonant 
trigram on Trial 11. This demonstrates again 
the salience of conceptual similarity, but 
the results do not support the authors’ claim 
that phonemic similarity is unimportant in 
short-term memory. The effect of conceptual 
similarity is simply in addition to any effect 
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of phonemic similarity in their experiment. 
Wickens, Clark, Hill, and Wittlinger (1968) 
used similar methods to demonstrate that 
the grammatical form class from which words 
are drawn is not a salient similarity dimen- 
sion in short-term memory. 


Semantic Similarity Effects in Other 
Paradigms 


Several variants of the probe memory 
method have been used to investigate se- 
mantic similarity in short-term memory. Bad- 
deley and Dale (1966, Experiments II and 
III) presented from two to six paired-associ- 
ale items and tested retention by presenting 
a single stimulus item as à recall probe. Stim- 
ulus similarity between pairs was varied by 
using adjectives of varying semantic simi- 
larity. Responses were words whose similarity 
was minimized. There was no significant effect 
of stimulus similarity on performance. Dale 
(1967) also found no effect of semantic simi- 
larity in retention for paired associates, with 
similarity manipulated as a stimulus variable. 
Dale's result is especially puzzling since 
both of his experiments used retention inter- 
vals greater than 1 minute, which should mean 
that the relative contribution to performance 
of long-term store was very large. 

Other investigators have been more success- 
ful in using probe methods to demonstrate 
semantic similarity effects in short-term mem- 
ory. Calfee and Peterson (1968) presented a 
list of eight words, four from each of two 
categories, and probed for recall by using a 
word's ordinal number in presentation. They 
found significant improvements in perform- 
ance as a result of providing the relevant 
category names both before and after presen- 
tation of the list. 

In two experiments which studied the ef- 
fects of conceptual similarity in short-term 
memory, Ligon (1968) used a serial probe 
method, in which the Nth item in presenta- 
tion is given as a recall cue for the N + 1th 
item. Sequences of consonants and digits, 
presented visually at a rapid rate (three per 
second), were vocalized during presentation 
to permit the elimination of perceptual con- 
fusions from the data. In Experiment I the 
number of consonants in a. 12-item sequence 
varied from 3 to 12, with conceptually simi- 
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lar items grouped into 3-item blocks. Reten- 
tion decreased as a function of the number 
and proximity of conceptually similar blocks, 
In Experiment IT each sequence contained six 
Consonants and six digits, in blocks of two, 
and the effects of proximity between similar 
blocks Were studied in detail. Again, reten- 
tion decreased as a. function of the proximity 
in presentation of conceptually similar blocks. 

Semantic effects in short-term memory have 
also been demonstrated by requiring subjects 
to group stimuli into semantic classes at re- 
call. It is possible that the semantic coding 
that occurs in this case is a retrieval phe- 
nomenon rather than the result of encoding 
Processes occurring during information stor- 
age. Gray and Wedderburn (1960) presented 
mixed lists of digits and either words or 
Syllables dichotically and asked for recall in 
any order. The words (syllables) formed 
meaningful phrases (words) if the subject 
alternated his attention from one ear to the 
Other during presentation. Analysis of the 
recall protocols showed that subjects were 
able to use this sequential constraint to order 
and improve their recall. Yntema and Trask 
(1963, Experiment I) reported a similar re- 
Sult using mixed strings of digits and unre- 
lated words. Instructions were given requir- 
ing recall grouped by order of arrival, ear of 
arrival, or by class of material. Recall by 
type of material was better than either of the 
Other methods. Harrison (1967) and Sanders 
and Schroots (1968) also showed that group- 
ing on the basis of conceptual category is 
Possible in recall when instructions to this 


effect are given. 


Time-Dependent Encoding Hypothesis 


In a study described earlier, Bregman 
(1968) showed that semantic, phonemic, and 
8taphic information tend to be equally salient 
and forgotten at the same rate when tested 
With a probe technique. In Bregman’s experi- 
Ment subjects were forced, by the nature of 
the task to use semantic cues in retrieving 
information from short-term memory. When 
Subjects are free to control completely their 
encoding strategy, there is some evidence that 
Semantic encoding is not used as freely. Thus, 
Kintsch and Buschke (1969) used the serial 
Probe method to investigate the effects of pho- 


nemic and semantic similarity in short-term 
memory. Similarity was manipulated by in- 
cluding either eight Synonym or eight hom- 
onym pairs in a list of 16 words. The recency 
portion of the retention curve, which indexes 
short-term store, was depressed by phonemic 
similarity but not by semantic similarity. The 
asymptotic portion of the retention curve be- 
haved in the opposite fashion. The authors in- 
terpreted these findings as indicating that 
short-term store is primarily phonemic and 
long-term store is primarily semantic. 

The difference between Bregman's results 
and Kintsch and Buschke's may be explained 
in either of two ways. First, it can be argued 
that subjects in Bregman’s experiment did 
not encode information in short-term store 
semantically, even when tested with semantic 
probes. Information in short-term Store could 
be strictly phonemic, and performance in the 
semantic probe condition would then be ex- 
plained by assuming that subjects recode the 
contents of short-term store into a semantic 
representation at the time of their comparison 
to the probe word. This explanation, then, 
preserves the hypothesis of a strictly pho- 
nemic short-term store offered by Kintsch 
and Buschke. 

An alternative explanation may be based 
on the hypothesis that encoding and rehearsal 
are time-dependent processes that may be 
time-shared and traded off with one another, 
as was suggested in an earlier section. Peter- 
son (1969) has also recently explicated the 
hypothesis that verbal activities may be time- 
shared, and has provided data supporting his 
hypothesis. If, in addition to time sharing, it 
is also assumed that rehearsal is a useful 
means of maximizing retention and that the 
encoding of phonemic features is faster than 
or begins before the encoding of semantic 
features, then it follows that the use of pho- 
nemic encoding is a useful means of maximiz- 
ing the amount of rehearsal and therefore the 
amount retained. Hence, whenever the en- 
coding of semantic features is not a task 
demand and would lead to a loss of rehearsal 
opportunity, encoding in short-term memory 
will be primarily phonemic. This is far dif- 
ferent from claiming that the memory trace 
in short-term store is by nature phonemic, 
The evidence just reviewed on semantic simi- 
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larity effects in the distractor paradigm at 
first seems inconsistent with this hypothesis 
since the emphasis on rehearsal is great in 
this paradigm, and there is no apparent task 
demand for the use of semantic information. 
However, the experiments done by Wickens, 
Hofer, Loess, and Calfee and Peterson, and 
others cited earlier invariably share the 
features of either using very slow rates of 
incoming information (generally about one 
trigram per minute) or using blocks of trials 
or blocks of items consisting of conceptually 
similar materials. Under these conditions it 
is not at all surprising that subjects can en- 
code the conceptual nature of the material 
used. The fact that increasing the extent to 
which vocal (audible) rehearsal is required 
causes an increase in retention and also in the 
proportion of phonemically based confusions 
(Murray, 1965) fits nicely with the hypothe- 
sized tradeoff between encoding and rehearsal, 
assuming that the requirement to vocalize 
maximizes the role of rehearsal. 


Semantic Similarity in Long-Term Memory 


The salience of semantic similarity in long- 
term memory is an often cited fact and has 
been demonstrated in studies using serial 
and paired-associates learning. Most of these 
studies have been concerned with retroactive 
interference, and only a few are cited as ex- 
amples. McGeoch and McDonald (1931) 
studied retroactive interference in serial learn- 
ing as a function of the meaningful similarity 
between words used in two successive lists. 
'The amount of retroactive interference in re- 
call increased as a function of similarity. Ef- 
fects of semantic similarity in paired-associate 
transfer were demonstrated by Ellis (1958) 
who varied the degree of synonymity of stim- 
uli in successive lists, and by Underwood 
(1951) who manipulated response similarity. 
In both cases the amount of positive transfer 
from List 1 to List 2 increased with meaning- 
ful similarity. The influence of semantic fac- 
tors is also one of the most interesting features 
of performance in free recall tasks (cf. Bous- 
field, 1953). 

There is little doubt that short-term mem- 
ory is affected by semantic similarity. How- 
ever, the majority of the evidence reviewed in 
this section is ambiguous with respect to the 
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properties of short-term store. Immediate re- 
call techniques, which heavily weight short- 
term store, have been infrequently used to in- 
vestigate semantic similarity, and the data 
available are equivocal. Semantic effects are 
easily demonstrated with the Brown-Peterson 
distractor method, but these experiments use 
retention intervals long enough so that the 
relative weights of short-term store and long- 
term store cannot be assessed. Studies that 
require grouping on the basis of conceptual 
properties in immediate recall provide better 
evidence for semantic effects in short-term 
store and point to the importance of instruc- 
tions as a means of getting subjects to encode 
semantic properties. Results obtained with 
probe techniques are mixed but seem to indi- 
cate that when semantic encoding is required 
(e.g., Bregman, 1968) it can be used, and 
will be fully as salient in short-term store as 
phonemic codes. The ability to vary encoding 
strategies as a function of instructions or 
task demands has been clearly demonstrated 
by Tversky (1969). She showed that material 
can be encoded either visually or verbally in 
short-term memory and that the e of en- 
coding used, for either verbal or pictorial 
stimuli, is determined by the subjects! ex- 
pectations about whether an immediate recog- 
nition test would use verbal or pictorial ma- 
terial. 


Summary: THEORETIC IMPLICATIONS OF 
SIMILARITY EFFECTS IN SHORT-TERM 
MEMORY 


In order to explain the occasional nature of 
semantic and other nonphonemic encoding in 
short-term memory, a hypothesis about the 
time-dependent nature of the encoding process 
was offered. Codes based on phonemic char- 
acteristics are obtained faster than semantic 
codes, hence leave more time for rehearsal, 
which is seen as a useful strategy in a short- 
term memory task. The encoding of semantic 
or graphic dimensions is possible, and when 
task demands require the use of such informa- 
tion, evidence for confusions, grouping, Te 
call, or recognition on the basis of such fea- 
tures can be obtained. Thus, the fact that 
evidence for phonemic encoding in short-term 
memory is easier to obtain than evidence for 
semantic encoding does not speak to the 
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issue of whether or not short-term and long- 
term memory involve theoretically distinct 
Storage systems, short-term and long-term 
store. This issue must be decided on other 
grounds, as for example, the usefulness of 
short-term store as an explanatory concept in 
the study of such phenomena as the recency 
effect in free recall (Glanzer & Cunitz, 1966), 
or the effects of massed versus distributed item 
repetitions on recall performance (Glanzer, 
1969). 

The fact that similarity-based confusions 
are readily found in immediate recall tasks 
does place at least one constraint on any 
theory of memory incorporating the concept 
of short-term store. In order to explain con- 
fusions it must be hypothesized that short- 
term traces do not have an all-or-none char- 
acter. Since confusion errors are found even 
when there is evidence of correct perception 
(Wickelgren, 1965a, 1965b, 1965c, 1965d, 
1966a, 1966b), it seems necessary to attribute 
confusions to traces which contain only par- 
tial information, either because they have 
partially decayed or are partially recovered 
from unlearning. 

A class of theories typified by Waugh and 
Norman's (1965) buffer model is inconsistent 
with the assumption of partial information. 
The mechanism of information loss postulated 
by Waugh and Norman is a bumping-out 
process which gives an item the property of 
either being in short-term store or not being 
in short-term store, with no intermediate 
status possible. This is inconsistent with the 
idea of partial information and cannot easily 
predict confusion errors. The buffer memory 
model formulated by Atkinson and Shiffrin 
(1968) is less susceptible to this criticism. In 
the verbal structure of their models it is as- 
sumed that short-term forgetting is due to a 
time-dependent decay process. The buffer 
represents a rehearsal strategy of the subject 
which is intended to offset the decay process. 
In the development of specific models from 
this structure, Atkinson and Shiffrin have al- 
ways made the simplifying assumption that 
the decay rate is fast enough so that once an 
item is removed from the buffer (no longer 
rehearsed) it will completely decay from 
Short-term store before it is tested. If this 
restrictive assumption were relaxed, a formula- 
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tion like Atkinson and Shiffrin's could be ex- 
tended to account for confusions based on 
similarity, by admitting the concept of par- 
tially decayed traces. Such a model would 
also have the useful property of describing 
trace strength at some time into a retention 
interval as the net result of trace decay and 
rehearsal. This is a necessary property for 
any decay model since forgetting is not a 
simple function of time alone. 
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Multimethod factor analysis is reviewed 


and is found to have substantive and 


mathematical problems. First, it assumes uncorrelated methods and second, 


orthogonal traits within but not between 


tions is questioned. Furthermore, multimethod factor 


methods. The validi 
analys 


' of these assump- 
is shown to yield 


a “very satisfactory delineation of trait factors (Jackson, 1969, p. 30]" because of 
a built-in bias. This bias is demonstrated and proved. Suggestions are made as to 


when multimethod factor analysis might 
mended that other methods be used inst 
discussed. 


Multimethod factor analysis (Jackson, 1969) 
was developed to demonstrate convergent and 
divergent validity among traits in a multi- 
trait-multimethod matrix following the model 
set forth by Campbell and Fiske (1959). While 
in agreement with most of the points made, 
some substantive and mathematical problems 
arise upon a closer investigation of multi- 
method factor analysis. On a substantive level, 
Jackson (Equation 2, p. 38) implicitly assumes 
that methods are independent and method 
variance is therefore not a problem in the 
heteromethod submatrices. Perhaps in certain 
cases this assumption can be met; however, 
it would seem that more frequently than not, 
some methods will share variance with other 
methods. It would seem that regardless of the 
technique employed, method covariance should 
be taken into consideration. An additional 
assumption, with contradictory implications 
for the traits involved, is that traits are in- 
dependent within a method (hence the identity 
matrix) but correlated between methods. This 
latter assumption, along with the former, 
places stress on the matrix, and together they 
can cause the non-Gramian properties that 
Jackson mentions. 

The mathematical problem seems somewhat 
more serious and exists even if the assumption 
concerning independence of methods can be 
met (but failure to meet this assumption could 
actually mitigate the problem). The problem 

is that the insertion of an identity matrix into 
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be appropriate; however, it is recom- 
ead. Several alternative techniques are 


the monomethod blocks will bias the results in 
the direction of confirming convergence for 
similar traits and independence or divergence 
for dissimilar traits. If the multimethod tech- 
nique is applied to a perfectly homogeneous ma- 
trix (except for unities in the main diagonal) 
the bias can be seen quite readily. A homogene- 
ous matrix under the usual factor-analy tic tech- 
niques (with properly chosen communalities) 
would emerge as being of unit ran however, 
when such a matrix is subjected to multi- 
method factor analysis, the results would show 
convergent and divergent validity. Further- 
more, the convergence and divergence is merely 
a function of the order of the variables within 
methods, and different traits could converge (or 
similar traits diverge) if rows and columns 
of such a matrix were permuted. Clearly, 
convergent and divergent validity does not 
exist in a homogeneous matrix. For example, 
consider two traits (A and B) under two 
methods (1 and 2) which have a correlation 
matrix that has all off-diagonal elements equal 
to .49. A communality estimate of .49 would 
lead to a single principal-axis factor which 
would show all traits having equal factor 
loadings of .70. Multimethod factor analysis; 
on the other hand, would result in four factors 
(Table 1) that could by proper rotation be 
used to support convergence and divergence: 
Obviously, a reversal of traits within one of 
the methods would result in different traits 
converging and diverging. 

Demonstration of biasedness can be shown 
in other ways as well. If the supposedly simila 
traits are truly identical under the P it 
methods (and are measured equally well unde 


~ 


? 


| 
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TABLE 1 
Factors or HYPOTHETICAL HOMOGENEOUS Matrix 


Multimethod factors 


Average Tara factors Rotated factors 


Method Trait 
I Il Il IV il II I Il 
1 Ay -70 50 50 .07 -70 50 .85 AS 
By 70 = 50) 7x50 07 40 —.50 15 85 
2 E 70 £07 70 50 85 AS 
By -70 —07 .70 —.50 419 .85 

Roots 1.98 0.02 1.98 1.00 


the different methods) then each of the 
heterotrait submatrices would have an identical 
expected structure 7 in the population. The 
diagonal of T contains the communality of the 
variables, and the off-diagonals would reflect 
the interrelationships among the traits. The 
matrix 7 would not need to be a diagonal 
matrix to support convergent and divergent 
validity, but its main-diagonal elements would 
merely need to be larger than the corresponding 
off-diagonal elements as outlined in Campbell 
and Fiske for heteromethod matrices (1959). 
The expected value? of the entire correlation 
matrix (which would be analyzed in a principal 
axis analysis) is given by: 


We gq T 
TAERE) 
TOT rj 
Mı 0 
+ Ws mt [1] 
0 j D 


Where M; represents method variance within 
"T 
each of the m monomethod blocks, and U* is a 


diagonal matrix. j 
Conger (1969) in the development of averoid 
actor analysis has shown that an averaging of 


? Deviations from this assumption merely require 
more manipulation and do not substantially change any 
of the arguments. Conger (1969) provides models 
to handle nonindependent methods and weighted 


Measurement. 4 : go 
Actually, T..is not an unbiased estimate 


for the same reasons that rzy is not an unbiased estimate 
of the population correlation pzy; however, for reason- 
ably large sample sizes the bias is negligible and can Be 
ignored, Thus, all of the expectations are approximate 
în that a small amount of bias is involved. 


the heteromethod submatrices provides an 
estimate of T which under the above assump- 


tions would be unbiased. The average is 
denoted 7'.. where 
» 1 m m 
T.2——— 5 37; [2 
m(m—1);2, ;2, ? [2] 
isj 


where 7;; is the heterotrait matrix for methods 
i and j. Model I of averoid factor analysis 
suggests that the appropriate matrix to factor 
is T... , and convergence and divergence can be 
determined from this and from the nature of 
the residual correlations. The hypothesized 
factor structure of Ry is assumed to be 


Fo 
= [3] 


F- 
Fo 
where T.. = FoF'o. Based on this, the expected 
value of the reconstructed correlation matrix, 
Rz.. = FF', would be 
T 


TE [4] 
T 


T 
E(Rz.)- |: 

T 

Note that this method of analysis is in full 
agreement with the requirements laid down 


by Jackson. m 
The overall average of the submatrices in the 


matrix to be factored is given in general by 


1 z E Lg 
Pec- An 2 EX Tjd—ZÍI; 
as m(m—l)i-i jmi ^ mi 7 

isj [5] 


After the appropriate substitutions into the 
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above equation, the principal axis analysis can 
be seen to utilize for its average matrix 


1 "m [fc HE 
4L. Gn T2) [6] 


which has an expected value of 

E(T ra) = T+ 6x C7] 
where gx? is the average of the method vari- 
ances. For multimethod analysis, Equation 5 
produces an average value 


m m—1 
Tuu = \—_ 


(since an identity matrix Z is substituted for 
Tii) which yields 


3d 
T.--—U) 8] 


E(Tuy) = (==) an lin, [9] 
m m 

The average for the averoid technique is 
merely P.. (since T.. is substituted for 7) 
with an expected value of T. Thus the principal 
axis method has been shown to bias the solu- 
tion so as to deny convergence and divergence 
since the constant ē x? would tend to emphasize 
a unit rank structure. The multimethod tech- 
nique, on the other hand, adds unity into the 
diagonal and zero into the off-diagonal, thus 
biasing the diagonal toward larger values (and 
therefore convergence) and the off-diagonal 
toward values nearer zero (and therefore di- 
vergence). Furthermore, the degree of biased- 
ness depends on the number of methods under 
which each trait has been observed. There is a 
maximum bias when two methods are used, 
and the biasedness decreases as more and more 
are employed. For the hypothetical 
x] above, the average matrix 
would have diagonal values of 
agonal values of .245. This 
able 1) which can be 
and divergent 
identical to the 
multimethod 


methods 
matrix anal 
(Equation 8) 
745 and off-di 
matrix has two factors (T 
rotated to show convergent 
validity. Furthermore, they are 
first two factors found in the 
factor analysis and, thus, the same convergent 
and divergent validity is “found.” . 
It should be stressed that the proof of bias 
was based on the assumpti <son, 
that the methods are in 


on, made by Ja 
dependent. This as- 
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sumption was challenged previously as being 
unlikely to be met, but a deviation from it 
could be favorable for multimethod factor 
analysis. If methods were interrelated in @ 
positive fashion, the bias could be reduced. 
But this is similar to two wrongs making 2 
right, and there is no guarantee that the 
relationship would be positive; therefore, it 
would seem more appropriate to use a model 
that did not require independence of methods. 
However, allowing for method factors raises 
some important questions. If there are general 
method factors, are they correlated with the 
trait factors or are there general factors 
common to both traits and methods? Is some 
provision made for a possible interaction be- 
tween traits and methods? 

Multimethod factor analysis does not come 
to terms with these problems even though it 
was developed to avoid the co ding of 
method and trait variance usu inherent 
in principal axis factor analysis. If method and 
trait factors were usually independent, no new 
technique would have been required. In its 
current formulation, multimethod factor analy- 
sis cannot resolve questions about interrelated 
method and trait factors; and because of the 
other problems inherent to it, it would seem 
worthwhile to look at other techniques which 
could answer these questions. The averoid fac- 
tor-analytic technique, which was presented 
for purposes of explicating the bias inherent in 
the multimethod and principal axis techniques, 
would be subject to these same criticisms; how- 
ever, as pointed out below, it can be sequenti- 
ally applied in such a way as to circumvent 
these criticisms. In addition, there are several 
other models which also attempt to meet the 
challenges of a multitrait-multimethod matrix- 

Boruch and Wolins (1968) were perhaps the 
first to design a confirmatory technique specific- 
ally for application to a multitrait-multimethod 
matrix. Although some of the constraints in 
their technique may be too restrictive, their 
basic approach allows general factors à" 
uncorrelated method and trait factors. They 
classify their technique as a special case of à 
class of techniques developed by Jóreskog an 
Gruveaus (1967). McDonald (1969) worked 
on a technique of generalized common factor 
analysis based on residual matrices of a pre 
scribed structure which can be applied ms 
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confirmatory fashion to a two-mode problem 
Such as a multitrait-multimethod matrix. The 
McDonald technique is a type of generalized 
interbattery factor analysis and thus may be 
subject to criticisms leveled at this type of 
technique (Kristof, 1967). Of course, if a more 
exploratory technique is desired, Tucker’s 
multimode factor analysis (1963, 1966) can be 
applied (cf. Hoffman & Tucker, 1964). Jóreskog 
(1969) has developed a general model for 
analyzing multitest-multioccasion data from 
Which more specific models can be derived de- 
pending on the tenability of various assump- 
tions. Jéreskog’s technique allows for the ex- 
traction of general factors common to traits 
and methods, but the trait-specific and method- 
Specific factors are otherwise orthogonal. The 
Jóreskog techniques are probably the most 
powerful, in that a good fit to the model is 
provided by either a least squares approach or 
a maximum likelihood method; however, they 
are not completely general due to the imposed 
orthogonality on method and trait factor sets. 
Conger (1970) by using a sequential applica- 
tion of averoid factor analysis allows for the 
Separate extraction of trait and method factors 
from which no other variance has been re- 
moved or from which the opposing type of 
variance has been removed. The technique 
allows for general factors as well as correlated 
method and trait factors. The technique is not 
as powerful as far as fit to data is concerned 
When compared to Jóreskog's technique, but 
it possesses a conceptual clarity which provides 
a good basis for understanding the desirable 
Characteristics of any solution to the multi- 
trait-multimethod problem and is the only 
solution which allows for correlated trait and 
method factors. In applications to various sets 
of data (Conger, 1970) it has yielded good 
results. 

In any case, there are a variety of techniques 
available which avoid the bias of multimethod 
factor analysis and principal axis factor analy- 
sis and additionally allow for the extraction of 
method factors. Of these techniques, three are 
Suggested as being most appropriate: If an 
exploratory analysis is desired, Tucker’s multi- 
mode (Tucker, 1963, 1966) is best; if a con- 
firmatory technique is desired which allows 
for the best fit, Jéreskog’s (1969) technique 
'S most appropriate; but if correlated method 
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and trait factor sets are sought, Conger’s 
(1970) analysis would be more appropriate. 


SUMMARY 


Although multimethod factor analysis might 
represent an improvement over informal 
methods, it is based on assumptions (a) which 
are difficult to meet (i.e., uncorrelated meth- 
ods) and (b) which are contradictory for the 
variables involved (viz, that traits are in- 
dependent within a method, but may be 
related between methods). A more serious 
problem is that the results promise to be good, 
that is, they are biased in the direction of 
finding convergent and divergent validity. 
This latter flaw can be somewhat ameliorated 
by using several methods, and is not very 
serious where there already is a strong tendency 
ior the average heterotrait method to be a 
diagonal matrix. The bias might also be 
counterbalanced if the assumption of method 
independence were not met. To the extent that 
methods are positively correlated, the structure 
would tend toward unit rank, and trait validity 
would be obscured; however, the bias of the 
multimethod technique could cancel this out. 

Because of its bias, multimethod factor 
analysis does not appear to be the best ap- 
proach toward confirming convergent and 
divergent validity. On the other hand, principal 
axis factor analysis will not clearly show trait 
convergence and divergence because of the 
influence of method variance. This holds 
whether or not methods are correlated. 
Principal axis factor analysis will confound 
trait factors with method variance even if 
methods are uncorrelated, and it could distort 
the trait and method factors even more 
iously if methods are correlated. If principal 
axis factor analysis were combined with rota- 
tions to some a priori trait structure rather 
than combined with rotations to simple struc- 
ture, it could probably shed some light on 
trait validity, but trait variance would still 
be contaminated with method variance. 

An additional criticism of multimethod 
factor analysis is that it does not provide 
information. about method factors and their 
relationship to the trait factors. However, 
there are methods which do suggest ways of 
avoiding the problems inherent in the multi- 
method technique while avoiding the con- 
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founding inherent in regular factor analysis. 
Unfortunately, most are not readily available 
except as unpublished manuscripts or technical 
reports (cf. Boruch & Wolins, 1968; Conger, 
1969, 1970; Jóreskog, 1969; McDonald, 1969), 
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Conger’s proof regarding bias in multimethod factor analysis is shown to rest 
on an irrelevant model and is, therefore, in error. The type of bias con- 
jectured by Conger to exist is impossible to demonstrate on either real or 
Monte Carlo data. Other conclusions by Conger are based upon a failure to 
distinguish between principal components and factor-analytic models, and upon 
an implicit redefinition of method variance. Cautions are expressed regarding 
the use of alternative factor-analytic models cited by Conger for the evaluation 
of multitrait-multimethod matrices because of non-Gramian properties of 
averoid factor-analytic matrices, the possibility of experimenter bias in 
parameter-fitting models, and the fact that the identification of trait factors 
permits generalizations about the convergent and discriminant validity of 


factor scores, and not necessarily about the validity of raw scores. 


Multimethod factor analysis (Jackson, 
1969), proposed as a technique for the evalua- 
tion of multitrait-multimethod matrices, was 
based on an interpretation of the Campbell 
and Fiske (1959) problem which implied a 
Set of general trait factors and a separate set 
of method factors unique to each method. 
This interpretation recognized the fact that 
there were frequently more distinguishable 
trait and method parameters than indepen- 
dent sources of data, and sought to identify 
Separate traits by substituting identity ma- 
trices for monomethod submatrices of the 
Correlation matrix (thus eliminating variance 
unique to each method) and proceeding to 
Utilize the method of principal components 
(Morrison, 1967) on the modified matrix to 
locate the direction of principal axes ac- 
Counting for maximum variance in hetero- 
method submatrices. This interpretation of 
the Campbell and Fiske problem requires 
Careful analysis, as does the technique sug- 
gested for its solution. Conger (1971) has 
provided essentially irrelevant criticisms. In 
the interests of brevity, issues raised by 
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Requests for reprints should be sent to Douglas 
N. Jackson, Department of Psychology, University 
of Western Ontario, London 72, Canada. : 

? Jackson, D. N. Component scores for multi- 
Method factor analysis. Paper presented at the meet- 
Ing of the Psychometric Society, Stanford, California, 
March 1970, 
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Conger’s paper will be outlined in numbered 
form. 

1. Conger’s conclusions are based on an 
implicit substitution of a new unstated model 
for that of multimethod factor analysis. 
Conger does not state what model he refers to 
in his critique, but it is not that of Jackson. 
Conger’s model is apparently based on the 
restrictive assumptions that supposedly simi- 
lar measures of traits are “truly identical” 
and that they are measured “equally well” 
under the different methods. These assump- 
tions, rarely, if ever, met in practice, are re- 
quired neither by Campbell and Fiske nor 
Jackson, nor are they appropriate conditions 
to impose on the evaluation of multimethod 
factor analysis. His method is not in full 
agreement with the requirements of Jackson, 

2. It is impossible to find any evidence for 
the kind of bias" “demonstrated and proved” 
by Conger in either real or Monte Carlo data. 
Conger’s attribution of bias is based on the 
application of his averaging model, irrelevant 
to multimethod factor analysis. A series of 
multimethod factor analyses of original and 
permuted matrices failed to uncover any evi- 
dence of the kind of bias alluded to by 

ger. 
aH (TT fails to distinguish between prin- 
cipal components analysis and factor analysis. 
Conger's comparison of results based on a 
factor analysis of a contrived homogeneous 
matrix with those derived from multimethod 
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factor analysis also involves an implicit mix- 
ing of models.* 

4. Confusion results from lack of explicit 
definition. Conger (1971) offers no explicit 
conceptual definition of method variance, and 
in fact used the term in different and mutually 
inconsistent ways in different parts of his 
paper. Jackson (1969, p. 40) recognized “that 
methods may differ in degree and that the 

identification of method factors is an im- 
s venture," but chose to define method 
variance “as that variance specific to a given 
method of measurement," but he counseled 
“sound judgment in identifying distinctly dif- 
ferent methods, . . ." Conger criticizes Jack- 
son for “implicitly assuming that methods are 
independent and method variance is not a 
problem in heteromethod submatrices," but in 
his own Formula 1 he defines a diagonal ma- 
trix of uncorrelated method factors, later re- 
ferring to correlated method factors. Jackson 
defined method variance in one way, Conger 

criticized Jackson for failing to adhere to a 

different definition, but reverts to Jackson's 

original definition. 

5. Averoid factor analysis involves restric- 
tive assumptions and, in general, the analysis 
of non-Gramian matrices. Conger offers no 
means for evaluating the degree to which one 
has met assumptions in averoid factor analy- 
sis, which are indeed considerably stronger 
than those implied by Campbell and Fiske. 
If assumptions can be met (no easy task), 
then the application of such a procedure mis- 
directs the focus in the direction of seeking 
general factors, rather than more differenti- 
ated trait-specific factors. If restrictive as- 
sumptions are not met in averoid factor analy- 
sis, difficulties arise which are in proportion 
to failure to meet assumptions. For example, 
the estimation of method factors will be biased 
to the extent that traits are not measured 


3 Components analyses of correlation matrices with 
off-diagonal matrices of less than unity and unit; in 
the diagonals will always yield as many nonnegative 
roots as there are variables. The problem posed by 
Conger of interpreting too many factors may be 
avoided by employing Kaiser’s rule of retaining 
only components whose roots exceed unity, Bice 
cither components analysis or multimethod gos 
analysis of a homogeneous matrix will always yie d 
one large root, with the remaining roots equal to or 


less than unity. 
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equally well by different methods. In general, 
average matrices will not be Gramian, and, 
under certain conditions, averaging can result 
in a null matrix, or in a negative definite 
matrix. 

6. Standard factor-analytic treatment is not 
an appropriate model for the problem posed 
by Campbell and Fiske. Conger’s recommenda- 
tions regarding application of various factor- 
analytic models overlooks an important dis- 
tinction between raw scores and factor scores. 
Campbell and Fiske sought to develop rules 
to evaluate the convergent and discriminant 
validity of raw scores, that is, the type of 
score used by the vast majority of test practi- 
tioners. But factor-analytic findings of sepa- 
rate trait and method factors refer to the 
separation of factor scores, not raw scores— 
every factor is seen as having a determinant 
contribution to every test, and, similarly, fac- 
tor scores are determined by every test in the 
battery. In a manner consistent with the op- 
eration of the suppressor variable, or with 
the related partial correlation, wholly uncor- 
related raw scores can load appropriate trait 
factors. Campbell and Fiske would not have 
attributed convergent and discriminant valid- 
ity to two measures correlating negatively. 
But Jackson and Messick (1961) reported an 
instance in which true- and false-keyed De- 
pression items correlate —.11 but shared the 
highest loadings in the same direction in 
uniquely defining a common factor after re- 
moval of response set variance. This illustrates 
how it is entirely possible to have trait defini- 
tion at the factorial level in the absence of 
evidence for the Campbell-Fiske criteria at 
the raw score level. The isolation of a relevant 
trait factor might be considered a necessary 
but not a sufficient basis for concluding that 
a given set of test scores possesses convergent 
and discriminant validity. 

7. The application of parameter-fitting 
models to multitrait-multimethod matrices, 
such as those proposed by Jóreskog * and bY 


* Jóreskog, K. G. A general method for the analyse 
of covariance structures, Paper presented at the s 
ing of the Psychometric Society, Stanford, Californ'® 
March 1970. 


| 


COMMENT ON CONGER 


McDonald * is subject to a similar type of 
capitalization upon chance as that identified 
In the rotation of axes (Humphreys, Ilgen, 
McGrath, & Montanelli, 1969). The latter 
authors demonstrated that it was possible to 
rotate the results of factor analyses of random 
data to positions that confirmed hypotheses 
about expected Structure, particularly when 
number of variables and of factors was large 
in comparison with number of observations. 
The use of procrustes-type rotations or of 
maximum likelihood factor analysis on the 
Campbell-Fiske problem should be under- 
taken cautiously with an awareness both of 
the inevitable discrepancies between sample 
and population parameters and the need for 


5 McDonald, R. P. A generalized common factor 
analysis based on residual covariance matrices of 
prescribed structure. Paper presented at the meeting 
of the Psychometric Society, Princeton, New Jersey, 
March 1969. 
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replication, and of the possible viability of 


alternative models. 
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NONPARAMETRIC INDEXES FOR SENSITIVITY AND BIAS: 


COMPUTING 


FORMULAS 


J. BROWN GRIER? 


Northern Illinois Universit Ly 


Computing formulas are derived for two nonparametric indexes of sensitivity and 
bias that have been suggested for signal detectability studies. A relationship is 
shown between the sensitivity index and P(J), a statistic whose sampling vari- 
ability is known. An additional index of bias is proposed, which is free of certain 
inconveniences, yet yields identical isobias contours. Use of the new indexes is 


illustrated with several sets of data. 


Development of the theory of signal detecta- 
bility has lead to a renewed interest in the 
possible processes involved in perception, 
psychophysics, and recognition memory. To a 
considerable extent both theory and research 
in these areas have rested on specific assump- 
tions about the underlying distributions (as 
in the various threshold theories versus 
normality). But even without explicit assump- 
tion about the distributions, data are often 
judged by how close they lie to operating 
characteristic curves derived from normal 
distributions, and different experimental condi- 
tions are characterized by their value of d'. 
Recently there has been a growing interest in 
various “nonparametric” analyses of detection/ 
recognition experiments, where specific under- 
lying distributions are not assumed. 

Following one line of development, Green 
(1964) has shown that for experiments using 
the yes-no procedure, the area under the 
(theoretical) operating characteristic curve can 
be interpreted as the percentage correct on an 
equivalent unbiased forced-choice test, and 
that this is true for any continuous underlying 
distributions. The sampling variability of this 
area measure has been determined by Pollack 
and Hsieh (1969). A similar proof for rating- 
scale experiments is due to Green and Moses 
(1966). However, using data to estimate the 
area under a curve of unknown theoretical 
shape presents difficulties. One expedient has 
been to connect the points and use the trape- 
zoidal rule (Green & Moses, 1966; Pollack, 


Norman, & Galanter, 1964). If the true func- 
! Requests for reprints should be sent to J. Brown 


Grier, Department of Psychology, Northern Illinois 
University, Dekalb, Illinois 60115. 


tion is convex and the points precisely deter- 
mined, this method is biased and will under- 
estimate the true area; Simpson's | ule should 
be better. However, exact area estimation will 
depend on having a functional relation be- 
tween data and the area. Also, the area 
analysis, while sufficient to describe the data, 
does not preserve the notions of sensitivity 
and bias. 

In a second approach to nonparametric 
analysis Pollack and Norman (1964) and 
Hodos (1970) have proposed measures based 
on the geometry of the unit square which can 
be interpreted as indexes of sensitivity and 
bias, respectively. » 

However, neither paper gives functional 
expressions for computing their index from the 
data, although Hodos does suggest a. graphical 
estimation procedure. The purpose of the 
present paper is to derive explicit computing 
expressions for the indexes, their associated 
isosensitivity and isobias contours; to show 
the relationship of the new sensitivity index to 
the area measure; and to give examples of 
their use. 


COMPUTING EXPRESSIONS 


In the absence of specific assumptions about 
the underlying distributions, and hence the 
operating characteristic curves which coul 
relate data points to the area, Pollack an 
Norman (1964) suggested computing an area 
statistic A’ which is the average of the maxi- 
mum and minimum possible areas associate 
with a point. Consider the outcome P = (x,y 
of a typical detection experiment plotted in 
Figure 1, where x is the probability of a false 
alarm, and y the probability of a hit. The tw? 
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Fic. 1. Typical experimental outcome represented as 
a point in the unit square. The probability of a hit 
LP(S/S)] is plotted against the probability of false 
alarm [P(S/N)]. 


solid lines through the point and (0,0), (1,1), 
respectively, form two nonoverlapping trian- 
gles, 441 and Ag, which define the locus of all 
possible operating characteristic curves through 
the point. The four line segments then define 
upper and lower bounds for Green's area mea- 
sure, Pollack and Norman's index is then 

A’ 2 Id iid 4») [1] 
where Z is the area under the solid lines. The 
index represents the average area under the 


upper and lower bounds. 

Dividing the unit square as indicated by the 
solid and broken lines in Figure 1, and using 
the coordinates of the point (x,y), the different 
areas in Equation 1 may be determined to give 


(9-30-»-3 1, 
4(1—2) - Dj 


Solving for y gives the expression for the iso- 
performance or isosensitivity curve 


§ = min(1, Vet — 8) F Q2» — (4/2) BI 


Al = 3+ 


where 

k=3—-—4[@+ A'(1 — #)]. 
(Uncapped letters refer to data points, and 
capped letters to general values.) This curve 
can be interpreted as the locus of all points 
giving equivalent "average" performance. 
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Pollack and Norman (1964, Figure 2) show 
operating characteristic curves for several 
values of A’, The curves are similar to normal 
operating characteristic curves for small values 
of A’, but slightly flatter for large values, 

The point of intersection with the negative 
diagonal can be determined as 


E 2(1 — A’) 2(1 — A’) 
ep - [45522 , 2874 


| [4] 


The value of # is P(I) as described by 
Pollack and Hsieh (1969) who relate it to 
d'ea nonparametric sensitivity index suggested 
by Egan. The expression can be simplified to 

1 T 
P) = 3—34^ [5] 
Other experimental outcomes along the iso- 
performance curve presumably differ from P 
because of different criteria or bias. 

Hodos (1970) has suggested as a non- 
parametric measure of bias the degree to 
which an outcome lies away from the negative 
diagonal. Referring again to Figure 1, now let 
A1 and 4» refer to the two triangles sharing a 
common right angle in the upper left corner. 
If his bias index is called B'r, Hodos has 
proposed 
41 — A2 

Ay 


By = [6] 
for points to the left of the negative diagonal, 
and the denominator changed to A» for points 
to the right. Again using the areas suggested in 
Figure 1, Hodos’ index can be shown to be 


x(1— x) 


Bru FEST L7] 


for points to the left of the diagonal and 


"m ; 
Bax x(1— x) [8] 

for points to the 1ight. Solving Equation 7 

gives the equation for the isobias contours: 

g 


[EZ om 


^ "NEN D t 
feu pem 


A family of these curves for different values of 
By is given in Figure 2 of Hodos (1970). 
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TABLE 1 
ó: ANALYSIS OF Two SETS OF DATA FROM GREEN AND SWETS 
" E P) — .660 «m PU) —Pü) 
PG/N) | P(S/S) P) | P() = .660 SDPO PU) —Pj DO rud 
090 335 | .666 oo | 04 | .003 02 
.205 510 -663 -003 .02 = = 
400 715 661 001 01 —.002 01 
490 185 .657 —.003 — 02 —.007 05 
690 925 669 | 009 | -06 .006 04 
| 
040 245 .640 | — —.020 —1.30 .007 ES 
430 -300 618 —0422 | +273 —.016 —1.04 
335 .695 .680 020 | 1.30 .046 3.00 
535 -780 -633 | —.027 | —1.75 —.001 — .06 
-935 975 .598 | —.062 —4.03 —.036 —2:33 


The change of formul r points to the left 
and right of the negati iagonal might be 
inconvenient, and another index is 


ETE 
E AE d. [10] 


or the difference in the two areas divided by 
their sum. The computing expression for this is 


y — 5 — «(1 — x) 


de ly) alt = oF 


[11] 


This index ranges from +1 to —1, but in a 
slightly different fashion. The solution for the 
contours is 


&(1 — 4)(1 + B") 
NEU SBOE) oy 


The two indexes put identical isobias contours 
through a given point, and choice between 
them seems to be a matter of convenience, 

If two points along an isoperformance curve 
are thought of as differing due to a change of 
criteria, a more natural index of change might 
be the likelihood ratio criterion. The value of 
this index for each point can be determined 
from the derivative of Equation 3 as 


B'-2(1—A") 


15—6x—144'4-16.174-84 ^ (1—3) 
ui MEE - 


deu ed [13] 
Ax(1—3)-- (&/2)? 


within the limits of the square. 


EXAMPLES 


y - T 
Use of the formulas is illustrated by several 


examples. Two sets of data reported by Green 
and Swets (1966, p. 90) are reproduced in Table 
1. The top half is from an auditory detection 
experiment in which the a priori probability of 
a signal was manipulated, and the lower half 
from the same experimental setting, but with 
the values of the decision outcomes varied. 
The data are plotted in Figures 2 and 3 along 
with the experimentally expected normal 
operating characteristic curve of d’ = .85. 
Visual inspection Suggests isosensitivity for the 
data in Figure 2 and possible rejection for 
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2. Plot of data from a study by Green 
and Swets (see Table 1). 


Kis, 


NONPARAMETRIC INDEXES F OR SENSITIVITY AND BIAS 


P(s/s) 


Y 


0 3.2.3 4 .5 6 7 8 9 10 
P (S/N) 


Fic. 3. Plot of data from a study by Green 
and Swets (see Table 1). 


Figure 3. The nonparametric index is used to 
examine isosensitivity. In this case the non- 
parametric isosensitivity curve is virtually 
identical in shape to the normal, and the plotted 
curve can be taken as representing both. For 
cach point the value of P(/) was computed 
from Equation 5, and is given in column 3. 
The predicted value of the P(/) based on a 
d’ of .85 is approximately .660. The deviations 
from the expected value are given in column 4. 
The standard deviation of P(/) can be read 
Írom Figure 17 of Pollack and Hsieh (1969) 
as about .06 and correcting to .V — 600 
on which the data are based gives SD P(I) 
= .0154. The number of standard deviations 
of each point from the expected is given in 
column 5. For the first experiment the hy- 
pothesis of isosensitivity cannot be rejected. 
The second experiment requires further analy- 
sis. The lower half of column 5 shows that their 
data do not agree closely with the theoretically 
expected results. A second question is whether 
the data points represent a common sensitivity, 
even though different from the expected one. 
In the absence of more explicit techniques for 
combining multiple observations to estimate 
à common curve the individual P(/)s are 
averaged and P(/) = .634. The deviation and 
the number of standard deviations from this 
respec- 


value are given in columns 6 and 7, 
tively. The fit to the average nonparametric 
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TABLE 2 
ANALYSIS OF DATA FROM Murvock 
P(S/N) | P(S/S) P( d(x, y — N) | d(x,y — NP) 
.053 460 -7418 -030 -020 
112 .556 1407 = -.006 
4331 —.020 —.002 
: 7364 —.020 = 
275 255 -7496 —.020 — 
309 -793 -7440 —.010 010 
363 839 7452 —=.013 007 
197 913 7353 .005 m 
576 .957 7396 .020 E. 


curve is still poor. The center observation is 
three standard deviations from the mean and 
just about as far from its nearest neighbor. 
The hypothesis of a common isosensitivity 
curve of the Pollack and Norman type does 
not seem tenable. 

Next, some data from a recognition memory 
experiment (Murdock, 1965, p. 445), which has 
been characterized by a normal operating char- 
acteristic curve, are examined. The data are 
reproduced in Table 2 and plotted in Figure 4 
with the normal operating characteristic curve 
of d' — 1.36 estimated by Murdock as a solid 
line. The values of P(/) for each point are in 
column 3 and their average is .7406. No point 
appears to be an outlier, and the average 
nonparametric curve is plotted in Figure 4 
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Fic. 4. Plot of data from a study by 
Murdock (see Table 2). 
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Fic. 5. Isobias contour from Hodos (see Table 3). 


as the dashed line. Observation suggests the 
data lie closer to the nonparametric curve than 
to the normal. To examine fit, the data were 
plotted on an expanded graph, the two curves 
sketched, and the perpendicular distance from 
each point to the respective curves was mea- 
sured with a pair of dividers, in units of the 
unit square. The signed deviations are given 
in Table 2. 

The average deviation from the nonpara- 
metric operating characteristic curve is about 
one-third that from the normal (.0050 versus 
0153). 

A 2c interval around the mean of the un- 
signed nonparametric deviations includes zero 
(.0050 + .0051), while the interval for the 
normal curve deviations does not (.0153 
+ .0066), The nonparametric curve seems to 
provide a more satisfactory characterization of 
the data. 

Use of the bias index is illustrated with some 
measurements from Figure 3A of Hodos (1970), 
part of which is reproduced in Figure 5. The 
data points lie close to the isobias contour. 
The location of each point was measured with 
dividers and is given in Table 3 along with 
values for B'/; and B". The significance of 
the observed differences is difficult to judge 
without sampling distributions, but the hy- 
pothesis of isobias suggested by the graph - 
be questioned. There may be some error o 
measurement since the original data were not 
available, but the example serves to illustrate 
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TABLE 3 
MgasvnED Locations iN Hopos’ FIGURE 


x | y | Bu p" 
| 

09 97 | —.476 
20 96 —.613 
35 .94 —.603 
63 94 —.610 
83 95 — 496 
88 97 | —.568 


the difficulty of making isobias judgments 
visually. The different isobias contours lie quite 
close together near the corners of the square. 


Discussion 


These derivations seem to provide new 
indexes that are both informative and easily 
computable for each data point. They are 
nonparametric only in the sense that no 
specific assumptions are made about the shape 
of the underlying distributions. For the sensi- 
tivity index a very specific assumption is made 
about the relationship between the distri- 
butions, but the assumed curve seems to fit 
some existing data at least as well and possibly 
better than symmetric normality. Since the 
assumed curve is symmetric, it might not be 
useful in experiments where the shape of the 
operating characteristic curve is important, or 
where results indicate assymmetry. The new 
statistics will be more useful when their 
sampling distributions are tabulated for use 
in making confidence statements, but, mean- 
while, questions of homogeneity within groups, 
and differences between groups can be ex- 
amined with available statistical procedures. 

An analytical expression for the area under 
the nonparametric operating characteristic 
curve would be desirable because of its interpre- 
tation in terms of equivalent forced-choice 
performance. For a single point, one can be 
obtained by using Equation 3, but the inte- 
gration must be done for two parts depending 
on whether A’ is equal to or greater than .75, 
and in either case gives extremely messy 
algebraic expressions which have not yielded 
to simplification. When multiple data points 
are available, the problem of combining them 
into a common area estimate will require 
statistical analysis. Simply computing an area 
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for each point and then averaging the different 
areas may or may not prove an effective 
procedure. 
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SQUARES ANALYSIS OF CATEGORICAL DATA 
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The three different least squares methods discussed by Overall and Spiegel are 
evaluated within the framework of making causal interpretation. It is shown that 


J 


the first two methods discussed by these authors should not be used to make 
inferences about causation because they necessarily involve contradictory assump- 


tions about causality. 


Overall and Spiegel (1969) discussed three 
methods for analyzing the "effect" of cate- 
gorical variables using least squares regression 
procedures. When applied to data from 
balanced experimental designs, the three 
methods yield identical results. As Overall and 
Spiegel clearly showed, however, the three 
methods can yield substantially different 
results when applied to data with dispro- 
portionate cell frequencies. Overall and Spiegel 
have made a real contribution by noting the 
differences in the three methods and empha- 
Sizing that interpretation of results must 
depend on the method used. 

The purpose of this note is to point out 
Some problems in the use of these methods to 
make causal interpretations about the “in- 
fluence” of categorical variables, All three 
methods are variations of the second procedure 
discussed by Linn and Werts (1969) and which 
Darlington (1968) labeled “usefulness.” For 
example, in Method 1 the sum of Squares 
ascribed to the row main effect is 


SS4LR? (A;,B;,AB,;) — R*(B,AB;j)], 
where 


SSp = total sum of squares in the de- 
pendent variable, 
A; = the row main effect, 
B; — the column main effect, 
AB;; = the interaction effect, 
R*(A,Bj,AB;;) is the proportion of variance 
in the dependent variable predict- 
able from main effects and the 
interaction, 
R*(B4AB;;) is the variance predictable from 
the column effect and interaction. 


! Requests for reprints should be sent to Charles E; 
Werts, Developmental Research Division, Educational 
Testing Service, Princeton, New Jersey 08540. 
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For purposes of causal analysis it is helpful to 
remember that “usefulness” is also the squared 
part correlation between an independent and a 
dependent variable with the linear “effect” of 
other independent variables removed from that 
independent variable. In the notation of this 
part correlation, the row main effect in Method 
1 would be 
SSyLR*a ni anipr]. 


The rationale behind a causal nterpretation 
of the part correlation is that esearcher 
wishes to remove spurious associ! | due to 
the influence of antecedent variables, Applying 
to Method 1 the principle that the controlled 
variables are assumed to be antecedent: 


1. The row main effect is 
SSy[ R4, .B;.ABij)¥, 


that is, the column and interaction effects are 
antecedent to the row effect; 
2. The column main effect is 


SSR oai anipr], 


that is, the row and interaction effects are 
antecedent to the column effect; and 
3. The interaction effect is 


SSR ana; nor], 


that is, the row and column effects are ante- 
cedent to the interaction effect, 


It follows that when used for purposes of 
causal analysis, Method 1 involves a mutually 
contradictory set of assumptions about the 
causal ordering of the row, column, and inter- 
action effects. This analysis applied to Method 
2 shows that 

1. The row main effect is SSR apr] 
that is, the column effect is antecedent to the 
row effect; 


c£ d 


CAUSAL ASSUMPTIONS FOR LEAST SQUARES ANALYSIS 


_ 2. The column effect is SST[ Ra, 55x], that 
is, the row effect is antecedent to the column 
effect. 


Thus, Method 2 when used for causal analysis 
makes contradictory assumptions about the 
causal ordering of the row and column effects, 
Method 3 which starts with an assumed causal 
ordering among effects is therefore the only 
one of the three methods which is causally 
consistent. The contradictory causal assump- 
tions in Methods 1 and 2 correspond to the 
fact that the various sources of variance do 
not add up to the total variance. In our 
opinion, Overall and Spiegel’s conclusion that 
their first method “should be used if con- 
ceptualization of the problem is in general 
lincar regression terms [1969, p. 319]" should 
include the disclaimer: “and when causal 
interpretations are not desired." 

The first, third, and fourth procedures dis- 
cussed by Linn and Werts (1969) also may be 
adapted to categorical variables by the use 
of dummy variables. Each of these procedures 
corresponds to a causal model, which a priori 
assumes a particular causal ordering of effects. 

Basically, causal analysis is an attempt to 
Simulate some phenomena of interest. In this 
sense, discussions of methods for causal analy- 
sis should take place in relation to a particular 
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research. problem, justifying the use of a 
particular statistical procedure because the 
mathematical model underlying that procedure 
appears to be a reasonable approximation of 
the way reality works in that situation. Both 
Linn and Werts (1969) and Overall and 
Spiegel (1969) implicitly evade the justifi- 
cation issue by assuming that the mathematical 
model underlying their statistical procedures 
correctly simulates the reality to be investi- 
gated. The particular methods discussed by 
these authors make no provision for errors 
of measurement (ie., it is implicitly assumed 
there are no such errors), reciprocal causation, 
or for the possibility that two or more observed 
variables measure the same underlying causal 
factors (i.e., in the economist's language, there 
is no “simultaneity”). 
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as a response to noxious stimuli and 
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of adrenaline, noradrenaline, 
and the glucocorticoids appear to be 
and noradrenaline are secreted early 
influence early acquisition of fear- 


motivated responses. ACTH and the glucocorticoids are secreted relatively late 


and influence late acquisition and exti 


nction of such responses. Together these 


substances form a physiological system which appears to significantly influence 
all stages of acquisition and extinction of fear-motivated responses. This 


system may also play a significant 
ulceration under certain conditions. 
effects of these substances are suggest 


PHYSIOLOGICAL BACKGROUND 


While the physiological importance of 
neurohormones and neurotransmitter sub- 
stances has been apparent for some time, 
many of their behavioral implications remain 
unknown. In this review it is proposed to 
describe a physiological system involving one 
neurotransmitter and a number of neuro- 
hormones which appear to influence fear- 
motivated responses. The transmitter sub- 
stance concerned is noradrenaline which is 
synthesized and secreted centrally by parts 
of the hypothalamus, and peripherally by 
postganglionic sympathetic nerve endings in- 
cluding the adrenal medulla. The relevant 
hormones are adrenaline, synthesized and 
secreted by the adrenal medulla, adreno- 
corticotrophic hormone (ACTH) from the 
anterior lobe of the pituitary gland (adeno- 
hypophysis), and the glucocorticoids synthe- 
sized and secreted by the adrenal cortex, 

There are a number of essentially physio- 
logical reasons for choosing these substances 
for investigation in relation to fear-motivated 
responses. First, the secretion of each as a 
function of physical or psychological stress 
has been consistently observed, and together 
they form a large component of the somatic 
reaction to such stress, (Selyé, 1950; Selyé & 
Heuser, 1956). In addition, psychological 


1 Requests for reprints should be sent to Maurice 
G. King, School of Behavioral Sciences, Macquarie 
University, North Ryde, New South Wales 2113, 


Australia. 


part in the formation of stress-induced 
Some explanations for the behavioral 
ed. 


rather than physical stress appears to be 
more important in inducing such secretions 
(Davson & Eggleton, 1968; Friedman, Ader, 
Grota, & Larson, 1967; Mason, 1968a, 
1968b). Second, the release of these sub. 
stances by stressful stimuli (stressors) leads 
to changes in physiological function which 
are adaptive and often necessary for survival 
(Table 1). Third, either the synthesis or the 
stress-induced secretion of each of these sub- 
stances depends on a complex physiological 
relationship in which each hormone may, to 
a certain degree, be regulated by the others 
(Figure 1). The hormones are thus depen- 
dent upon one another in this somatic stress 
Cycle. Fourth, stress-induced secretion of these 
substances (in rats at least) follows certain 
Significant temporal characteristics: peripheral 
catecholamine (adrenaline and noradrenaline) 
secretion is neurally mediated via the splanch- 
nic nerves of the sympathetic nervous system, 
and their physiological effects are conse- 
quently evident almost immediately (Davson 
& Eggleton, 1968). On the other hand, 
ACTH release from the adenohypophysis is 
humoral and therefore slower, occurring 
within 10 seconds (Gray & Munson, 1951). 
Stress leads to the release of corticotrophin- 
releasing factors from the median eminence 
of the hypothalamus. These are in turn trans- 
ported via the portal-hypophysial blood ves- 
sels to the anterior pituitary where they 
stimulate ACTH secretion (Bowman, Rand, 
& West, 1968). The mechanism by which 
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adrenaline leads to ACTH release is at 
present largely theoretical. 
Controversy concerning whether this occurs in 
Species other than the rat. It is likely that 
adrenaline stimulates peptides from the pos- 
terior lobe of the pituitary which in turn 
stimulate ACTH secretion from the anterior 
lobe (Bowman et al., 
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and there is some 


1968; de Wied, 1965; 


TABLE 1 


PRINCIPAL PHYSIOLOGICAL 
CarECHOLAMI 


White, Handler, & Smith, 
release of glucocorticoids in significant quanti- 
ties during stress requires 
minutes (Hodges & Jones, 1963). Thus, maxi- 
mal secretion of the catecholamines, ACTH, 
and the relevant adrenal steroids follows a 
definite temporal 

glucocorticoids may inhibit further ACTH 


AND PHARMACOLOGI 
ACTH, AND ADRE 
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1964). Finally, 


from 15 to 60 


pattern. In addition, the 


AL Errkcrs OF THE 
STEROIDS 


Substance 


Effects 


General adaptive functions 


Central nor- 
adrenaline 


Peripheral nor- 
adrenaline 


Central 
adrenaline 


Peripheral 
adrenaline 


Central ACTH 


Peripheral 
ACTH 
Central steroids 


Peripheral 
steroids 


Drug-induced decreased and increased levels 
associated with sedation and excitement, 
respectively, in both man and animals 
(Schildkraut & Kety, 1967). 


Transmitter substance in postganglionic 
sympathetic nerve endings; maintains 
blood pressure (Davson & Eggleton, 1968); 
produces reticular activating system (RAS) 
arousal (Stein. 1967). 


Not synthesized centrally in significant 
quantities. 


Produces RAS arousal (Stein, 1967); pro- 
motes sweating; moves blood from skin to 
muscles; raises free fatty acid level and 


mobil blood glucose; inhibits muscular 
activity in abdominal viscera and increases 


output of heart (Davson & Eggleton, 1968; 
Ganong, 1967) 


ACTH and other pituitary peptides present 
in hypothalamus, but their function is un- 
known (Guillemin, Schally, Lipscomb, 
Anderson, & Long, 1962). 


Stimulates both fatty acid release by adipose 
tissue and glucose utilization (White, 
Handler, & Smith, 1964). 


Adrenal steroids not synthesized centrally. 


These may be divided into mineralocorticoids 
and glucocorticoids. 

The major mineralocorticoid is aldosterone 
which controls electrolyte and water me- 
tabolism, the major glucocorticoids are 
corticosterone and hydrocortisone which 
control carbohydrate and protein metab- 
olism. 

Glucocorticoids must be present for the 
catecholamines to exert their calorigenic 
action and their vascular effects (Davson & 
Eggleton, 1968; Ganong, 1967). 


Influences alertness and behavioral arousal in a 
relatively nonspecific manner, 


Maintains reílex excitability of sympathetic 
portion of autonomic nervous System; pro- 
tects organism from acute effects of 
hemorrhage. 


Skin more difficult to penetrate; decreases 
bleeding and allows greater muscle output; 
(quick) energy factor 


Energy factor 


Relatively slow acting, long-term energy fac- 
tors, especially as regards muscular energy. 


Adrenal steroids are critical for life mainte- 
ance and somatic resistance to stress. 
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release. Inhibition occurs when circulating 
steroid levels begin to fall, and maximum 
inhibition does not occur until such levels 
have returned to base line. The fall in circu- 
lating steroids does not occur for at least 
1 hour after termination of stress (Ganong, 
1967; Weiss, McEwen, Teresa, Silva, & 
Kalkut, 1969). 

Such physiological phenomena appear to 
complement behavioral data concerning hor- 
mones and fear-motivated responses. The pos- 
sible significance of the end-loop in Figure 1 
is discussed independently in a later section. 


BEHAVIORAL BACKGROUND 
Catecholamines and Behavior 


The catecholamines generally, and central 
noradrenaline particularly, appear to be closely 
related to behavioral arousal (Table 1). In 
addition, noradrenaline effects seem to be 
more strongly associated with a physiological 
response resembling anger or aggression rather 
than fear, while the reverse is true of adren- 
aline. Thus, 

Increased epinephrine [adrenaline] excretion seems 
to occur in states of anxiety or in threatening situ- 
ations of uncertain or unpredictable nature in which 
active coping may be required but has not been 
achieved. In contrast, norepinephrine [noradrenaline | 
excretion may occur in states of anger or aggression 
or in situations which are challenging but predictable 
and which allow active and appropriate behavioral 
responses to the challenge. Under various conditions 
increase of either epinephrine or norepinephrine or 
of both of these catecholamines may represent specific 
adaptive responses . . . [Schildkraut & Kety, 1967, 


p. 8]. 

In a series of experiments, Brady and his 
associates (1967) demonstrated that adrena- 
line levels are increased in ambiguous 
threatening situations while noradrenaline 
levels are increased in nonambiguous threaten- 
ing situations as well. A possible implication 
of this is that catecholamine secretion may 
be associated with increased behavioral 
arousal during acquisition of fear-motivated 
responses. Any adrenaline-induced effects 
would appear to be maximal during early 
acquisition of such behavior, at which time 
ambiguity concerning the required coping re- 
sponse would also be maximal. On the other 
hand, noradrenaline appears to be associated 
with the entire acquisition process. 
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While the examination of secretion or excre- 
tion levels of adrenaline and noradrenaline in 
particular situations is a useful technique, it 
does not allow conclusions concerning actual 
"causes" to be drawn. Other techniques have 
been used to investigate catecholamine effects 
on fear-motivated responses, however. These 
may be divided into those which have artifi- 
cially increased peripheral amine levels by 
injection, or decreased peripheral levels by 
demedullation, sympathectomy, or adrenalec- 
tomy. In addition, catecholamine effects were 
blocked in one study by injecting an adrenergic 
blocking drug, and in another by blocking 
the response of the entire autonomic nervous 
system. Since adrenalectomy involves removal 
of the adrenal cortex as well as the medulla, 
studies in which this technique was employed 
are reviewed in a following section on ACTH 
and the steroids. 

Amine increase and fear-motivated re- 
sponses. No studies have been carried out in 
Which noradrenaline was injected. A number 
of studies, however, have increased adrena- 
line levels by this means. These may be con- 
veniently classified into two groups; those 
which have measured adrenaline effects on 
"innate" fear indexes such as defecation, and 
those which have used the usual learned 
indexes of fear such as escape and avoidance 
responding. The former, which are not uni- 
versally accepted as using reliable measures 
of fear, are dealt with first, 

In general, adrenaline injections alone have 
had mo effect on fear responses in rats 
(Sharpless, 1961; Singer, 1963). When adren- 
aline injections were associated with a variety 
of noxious stimuli, however, there were sig- 
nificant increases on such measures as defeca- 
tion, urination, and trembling (Leventhal & 
Killackey, 1968; Singer, 1963). In addition, 
the Leventhal and Killackey study demon- 
strated that rats that were given both hor- 
mone and noxious stimulation chose a familiar 
compartment rather than a novel one; a fur- 
ther index of increased fear. Finally, Kamano 
(1968) showed that adrenaline-injected rats 
spent less time in a compartment previously 
associated with shock. These findings are 
consistent with the Schachter and Singer 
(1962) hypothesis that administration of 
adrenaline may produce a nonspecific state 
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of arousal, and that past experiences of the 
subject and the characteristics of the experi- 
mental situation may be the factors which 
determine the quality and intensity of the 
elicited emotions. 

Only one study that has investigated the 
effects of injected adrenaline on a learned 
fear-motivated response obtained significant 
results. Latané and Schachter (1962) demon- 
strated that a small dose of injected adrena- 
line significantly enhanced two-way shuttle- 
box avoidance acquisition in rats, while a 
larger dose retarded acquisition. A replica- 
tion of this study by Stewart and Brookshire 
(1967), however, did not obtain the enhance- 
ment effect. Other studies, which have exam- 
ined the effects of different doses of adrena- 
line on learned responses motivated by fear, 
have obtained either no effect or retarded 
performance in both acquisition (Kosman & 
Gerard, 1955; Moyer & Bunnell, 1958; Sines, 
1959; Stewart & Brookshire, 1968), and 
extinction (Leshner & Stewart, 1966). 

This pattern of results with the injection 
technique appears to refute the hypothesis 
that increased adrenaline and learned re- 
sponses motivated by fear are positively 
related. The technique has at least one serious 
deficiency, however. Kosman and Gerard 
(1955) were the first to note that adrenaline 
injections may lead to locomotor impairment. 
This is hardly surprising when it is considered 
that in an aversive situation the adrenal 
medullae of the organism are already secreting 
optimum amounts of adrenaline. Thus, any 
further administration of the hormone may 
lead to physiological debilitation. Such debili- 
tation increases with increasing dose of adren- 
aline. The injection technique is consequently 
regarded as unsuitable for examining the ef- 
fects of adrenaline or learned fear responses. 

Amine decrease or blocking and fear- 

motivated responses. Three studies have 1n- 
«vestigated the effects of adrenaline depletion 
on avoidance acquisition in rats. All used the 
demedullation technique which eliminates the 
source of adrenaline, while not significantly 
reducing noradrenaline stores (Ganong, 1967). 
Moyer and Bunnell (1958) found that this 
had no effect on one-way shuttle-box avoid- 
ance responding, while Levine and Soliday 
(1962) and Conner and Levine (1969) found 
| 
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that demedullation retarded acquisition of a 
two-way shuttle-box response. It is possible 
that the different results obtained by these 
studies are due to their different procedures. 
On the other hand, as Levine and Soliday 
(1962) noted, the adrenal cortex may require 
as much as 4 weeks to recover from the 
physical insult which occurs during demedul- 
lation; the Moyer and Bunnell study allowed 
for only a fraction of this required time, and 
their procedure could therefore have produced 
experimental effects more in keeping with 
adrenalectomized rather than demedullated 
rats. In the Levine and Soliday study, the 
greatest difference in performance between 
the operated and control groups occurred on 
the last 30 of a 90-trial acquisition series. At 
first this does not appear to be consistent 
with the hypothesis that adrenaline has its 
primary effect on early rather than late acqui- 
sition. An explanation for this apparent in- 
consistency, however, may be in the use of a 
two-way avoidance response as the measuring 
instrument. This is normally a far more dif- 
ficult task for a rat to learn than a one-way 
response (Kenyon & Krieckhaus, 1965). 
Consequently, performance on the first 60 
trials may have been so low for both control 
and experimental groups that no differentia- 
tion was possible at this stage of acquisition. 
The data presented by Levine and Soliday 
appear to confirm this. The hypothesis is con- 
sistent, however, with data presented by 
Moyer and Korn (1965) who showed that 
demedullation had no effect on retention of 
an avoidance response. 

Wynne and Solomon (1955) investigated 
the effects of sympathectomy on shock avoid- 
ance performance in a shuttle-box situation 
with dogs. Their surgical procedure signifi- 
cantly decreased the availability of adrena- 
line and noradrenaline. It was found that 
sympathectomy before acquisition retarded 
escape learning as well as the onset of the 
first avoidance response, and decreased per- 

mance in extinction. If sympathectomy was 
carried out after an acquisition criterion had 
been reached, however, and before extinction 
trials had begun, there was no effect on 
extinction peip ae 1985) fous that 

Kosman and Gerard (100) bits the 
injection of dibenzyline, which in 


for! 
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excitatory effects of both adrenaline and nor- 
adrenaline, had no effect on lever-press avoid- 
ance in rats. The drug was injected. aíter the 
response had been well-learnt. Arbit (1958) 
found that injection of tetraethylammonium, 
an autonomic nervous system ganglion-block- 
ing drug, decreased performance on a shock- 
motivated serial discrimination learning task 
in rats. While the drug was injected prior to 
the commencement of acquisition trials, spe- 
cific conclusions concerning amine effects can- 
not be drawn from this study, since the drug 
blocked the parasympathetic nervous system 
response as well as the sympathetic response. 


Conclusions 


The data suggest that adrenaline and fear 
responses of an “innate” or reflexive nature 
are associated. Only meager evidence exists, 
however, which demonstrates any significant 
relationship between this substance and 
learned responses motivated by fear. It is 
possible, therefore, that adrenaline may influ- 
ence only “innate” fear responses. The alter- 
native is that the effects of adrenaline on 
conditioned fear-motivated responses have not 
been consistently demonstrated due to inade- 
quate methodology in an area which requires 
particularly efficient techniques. 

No studies have attempted to examine what 
independent part noradrenaline plays in 
learned fear-motivated behavior. That it may 
play some part is suggested by strong evi- 
dence that this substance is associated with 
behavioral arousal. In addition, noradrenaline 
has a number of physiological effects in com- 
mon with adrenaline (Ganong, 1967). 

Existing data indicate that the cate- 
cholamines may be important at least during 
early acquisition of learned responses moti- 
vated by fear. Adrenaline may have little in- 
fluence, however, on later acquisition, reten- 
tion, and extinction. In two-factor theory it is 
accepted that during early acquisition of an 
avoidance response, fear (however defined) of 
the conditioned stimulus (CS) is classically 
conditioned by means of conditioned stimulus- 
unconditioned stimulus (CS-UCS) presenta- 
tions. Further, in this early period amine 

scecretations and their physiological effects 
are maximal (Brady, 1967). Tt is consequently 
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hypothesized that these peripheral physio- 
logical effects (increased heart rate, for ex- 
ample), may become classically conditioned 
to the CS and act as stimulus-producing 
responses for the fear-motivated response. In 
fact, it has been clearly demonstrated by 
Black that the CS in an aversive conditioning 
situation can reliably influence heart rate 
(reported by Solomon and Brush, 1956, pp. 
282-284). During later acquisition and extinc- 
tion when the UCS does not follow the C$ 
with any high degree of probability, such 
stimulus-producing responses may extinguish 
and play little part in the relevant behavior. 
It is interesting to note that adrenaline secre- 
tion returns to base level during this time 
(Brady, 1967). This may be due to the 
achievement of active coping behavior. As 
Wynne and Solomon (1955) pointed out, 
"after the organism has initially succeeded in 
finding a method of avoidance, continued 
autonomic upset may interfere with orderly 
and consistent avoidance by exciting random 
and uncoordinated activity [p. 280]." Such 
random activity could be adaptive, however; 
in the early “trial and error" process of 
learning to avoid. 


ACTH, Steroids (Mainly Glucocorticoids), 
and Behavior 


While behavioral effects of the catechol- 
amines can be discussed and researched inde- 
pendently of ACTH and the steroids, this is 
not true of ACTH or of the steroids. Figure 1 
shows that ACTH secretion may be governed 
by two mechanisms: adrenaline secretion and 
corticotrophin-releasing factors. Because of 
the alternative mechanism available here, 
eliminating the adrenal medulla would not 
lead to marked changes in ACTH secretion 
during stress. Glucocorticoid release, however; 
depends entirely on ACTH secretion, In ad- 
dition, ACTH release is to an extent con- | 
trolled by circulating glucocorticoid levels. It “}* 
is therefore difficult to vary either ACTH or 
glucocorticoids without at the same time vary- 
ing the other. Behavioral effects of these 
hormones should consequently be discussed in 
terms of one of the following four possible — 
relationships: (a) high ACTH-high gluco-* 
corticoid levels, (5) low ACTH-low gluco- 
corticoid levels, (c) low ACTH-high gluco- 
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corticoid levels, and (d) high ACTH-low 
glucocorticoid levels. Since this has not been 
generally noted in the literature, most studies 
do not classify easily into one of these four 
categories. The classes can be utilized, how- 
ever, to organize the large amount of data 
available in this area. 

Effects of high ACTH-high glucocorticoid 
levels on learned fear-motivated responses. 
This category includes intact-unoperated sub- 
jects that have been stressed, as in avoidance 
training, for at least 1 hour, and subjects 
that have been injected with ACTH prior to 
such stress. It should be noted here that high 
plasma steroid levels following ACTH injec- 
tion do not significantly inhibit further ACTH 
release during stress (de Wied, 1964b; Hodges 
& Jones, 1963). In addition, since circulating 
steroid levels during stress are highly in- 
dicative of ACTH levels at this time, plasma 
measures of such steroids during stress are 
indicative of the possible physiological and 
behavioral effects of both. This has not been 
adequately recognized by some researchers 
who have attributed experimental behavioral 
effects to steroids, when in fact ACTH may 
also have been implicated. 

Thus higher plasma steroid levels and 
better passive avoidance performance in dogs 
have been associated (Lissák & Endréczi, 
1961). Higher plasma steroid levels and 
greater conditioned suppression in monkeys 
have also been related (Brady, 1967). In 
addition, a .positive correlation between 
plasma steroid levels and active avoidance 
performance in rats has been consistently 
demonstrated (Bohus & Endróczi, 1964; 
Brush & Levine, 1966; King, 1969; Levine 
& Brush, 1967; Wertheim, Conner, & Levine, 
1969). Sidman, Mason, Brady, and Thatch 
(1962) have shown that shock frequency and 
lever-pressing rate independently serve to vary 
steroid levels in monkeys on a series of bar- 
press avoidance tasks. Therefore increased 
ACTH-steroid levels may be induced by 
“effort” as well as shock stress. This indi- 
cates that such increased levels may also be 
positively correlated with performance on 
tasks not involving aversive conditioning. 

Studies which have injected ACTH, rather 
than taking plasma measures, do not present 


clear-cut results in acquisition of fear- 
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motivated responses. Thus, Murphy and 
Miller (1955) demonstrated no effect on 
acquisition of an active avoidance response in 
rats injected with ACTH. On the other hand, 
injection of ACTH has been found to enhance 
Jump-up and shuttle-box avoidance perform- 
ance (Bohus, Nyakas, & Endróczi, 1968; 
Levine & Brush, 1967), and Sidman avoid- 
ance performance (Wertheim, Conner, & 
Levine, 1967). While injection in the Murphy 
and Miller study occurred before acquisition 
trials had begun, injection in the other studies 
occurred after the relevant coping response 
had been either partially or fully acquired. 
This may be a crucial difference since there 
is some evidence that injections of ACTH 
may be inhibitory on a complex task or early 
in learning, and facilitatory on an easy task, 
or late in learning (Korányi, Endróczi, 
Lissák, & Szepes, 1967). These possible ef- 
fects may have been additive in the Murphy 
and Miller study, and thus led to no signifi- 
cant results in acquisition. 

Effects of injected ACTH on extinction 
performance in both rats (Bohus et al., 1968; 
Murphy & Miller, 1955) and mice (Korányi 
et al, 1967) have been examined. Whether 
injection occurs during acquisition or extinc- 
tion, the active avoidance response is pro- 
longed in extinction. Finally, injected ACTH 
has enhanced both passive avoidance of a 
bar-press response for water in rats (Levine 
& Jones, 1965) and suppression of an explora- 
tory response in mice (Korányi et al., 1967). 
These latter results are not surprising when 
it is considered that resumption of a re- 
sponse in passive avoidance is similar to 
extinction of an active avoidance response. 

The conclusions which may be drawn from 
this group of studies is that high ACTH-high 
glucocorticoid levels reliably serve to prolong 
learned fear-motivated responses. In addition, 
such levels appear to retard or enhance 
acquisition of these responses depending on 
the difficulty of the task or the stage of 
acquisition. F urther work is required to 
substantiate this possibility, however. ue. 

Effects of low ACT. H-low glucocorticoid 
dies on learned fear-motivated responses. 
(For convenience this group of studies also 
includes those in which AC TH-glucocorticoid 
levels were boosted by ACTH injection, after 
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depletion of these hormones by various 
means.) It has already been mentioned that 
plasma steroid levels are positively correlated 
with active avoidance performance. Other 
methods which have been used for examining 
the effects of reduced levels of ACTH and 
glucocorticoids on learned fear responses are 
the complete surgical removal of the pituitary 
gland (hypophysectomy), or removal of the 
posterior lobe of the pituitary (posterior 
lobectomy), or removal of the anterior por- 
tion of this gland (adenohypophysectomy). 
Before summarizing the results of this group 
of studies it is important to note some sig- 
nificant “side effects" which are a result of 
these techniques. First, posterior lobectomy 
not only leads to a reduction in ACTH secre- 
tion from the anterior lobe during stress (de 
Wied, 1965) but to the removal of posterior 
lobe hormones as well. These include oxytocin, 
vasopressin, a- and £-melanocyte stimulating 
hormones. Adenohypophysectomy not only 
eradicates the source of ACTH (and conse- 
quently glucocorticoid secretion), but also the 
source of growth, thyrotrophic, luteinizing, 
lactogenic, and follicle-stimulating hormones. 
Hypophysectomy, of course, removes the 
source of all these hormones. Removal of 
thyrotrophic hormone which leads to a drastic 
reduction in thyroid secretion may be espe- 
cially confounding, since this leads to a gen- 
erally reduced metabolic rate (Bowman et 
al, 1968). Finally, any technique which 
brings about a drastic depletion of adrenal 
steroids also leads to a significant decrease 
in adrenaline production (Wurtman & Axel- 
rod, 1965) and prevents several physiological 
actions of secreted catecholamines (Table 1). 
Certainly these surgical techniques are any- 
thing but specific, and little is known 
concerning the possible behaviorally confound- 
ing effects of removal of hormones other 
than ACTH. 

Hypophysectomy, adenohypophysectomy, 
but not posterior lobectomy, lead to a retarda- 
tion of escape and avoidance acquisition in 
rats. This retardation effect is reversed by 
injections of ACTH or a mixture of anterior 
pituitary and adrenal cortex hormones con- 
sisting of cortisone, testosterone, and thyrox- 
ine (Applezweig & Moeller, 1959; De Wied, 
1964a, 1964b, 1965). Hypophysectomy, ade- 
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nohypophysectomy, and posterior lobectomy 
all facilitate extinction of an active avoidance 
response. This effect is also reversed by 
injection of ACTH in hypophysectomized rats 
(Weiss et al., 1969), or ACTH or posterior 
lobe hormones in posterior lobectomized rats 
(de Wied, 1965), or a mixture of cortisone, 
testosterone, and thyroxine in adenohypo- 
physectomized rats (de Wied, 1965). Finally; 
there is less shock-induced suppression of an 
exploratory response in hypophysectomized 
rats than in controls (Weiss et al., 1969). 

Within the limitations of the techniques 
used in this group of studies, several conclu- 
sions may be drawn. First, while it is further 
confirmed that the ACTH-glucocorticoid sys- 
tem influences conditioned fear responses, no 
definite conclusions can yet be drawn con- 
cerning the part separately played by ACTH 
and by the glucocorticoids. It is tempting to 
suggest that some of the behavioral effects 
brought about by ACTH injection were due 
to this hormone alone. This is because hypo- 
physectomy or adenohypophysectomy would 
lead to some adrenal cortex atrophy in the 
time required for the animals to recover from 
these operations. Thus, steroid release as 4 
response to ACTH injection could have been 
significantly decreased in such operated sub- 
jects. No data concerning adrenal cortex 
atrophy were presented in the above studies; 
however. 

Second, it can be concluded that ACTH 
per se is not critical for acquisition of fear- 
motivated responses if certain other hormones 
are present. Tt is suggested by de Wied (1965) 
that different hormones may have similar be- 
havioral effects. Thus, increase in others may 
to an extent compensate for a lack of ACTH. 

Finally, since posterior lobectomy reduces 
stress-induced ACTH release from the ante- 
rior pituitary, and since this procedure had no 
effect on acquisition but facilitated extinction , 
of an active avoidance response, it may bes 
concluded that extinction performance is more 
sensitive to ACTH levels than acquisition 
performance. 

Effects of low ACTH-high glucocorticoid 
levels on learned fear-motivated responses: 
This category includes studies in which sub-‘ 
jects have been injected with a glucocorticoid 
either with or without previous hypophysec 
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tomy. Injection of either hydrocortisone or 
dexamethasone (a powerful synthetic gluco- 
corticoid) in unoperated rats has been found 
to enhance shuttle-box avoidance (Levine & 
Brush, 1967) and bar-press avoidance per- 
formance (Wertheim et al., 1967). Injection 
in these studies occurred after the relevant 
response had been either partially or fully 
acquired. On the other hand, cortisone injec- 
tions prior to any acquisition failed to affect 
performance on a jump-up avoidance task 
(Bohus & Lissák, 1968). 

Two studies have included hypophysectomy 
in their design as well as the injection pro- 
cedure. Hypophysectomies were carried out 
on all subjects including the controls. This 
controlled for the possible confounding effects 
of removal of "irrelevant" hormones. In the 
first study by de Wied (1966), it was demon- 
strated that injection of either corticosterone 
or dexamethasone facilitated extinction of an 
active avoidance response. In the second study 
by Anderson, Winn, and Tam (1968), it 
was shown that hydrocortisone had no effect 
on a passive bar-press avoidance response 
for food. 

The conclusions to be drawn from these 
studies are, first, that glucocorticoids inde- 
pendently enhance performance on fear- 
motivated tasks when the relevant coping 
response has been at least partially acquired. 
Second, glucocorticoids independently facili- 
tate extinction of a shuttle-box avoidance 
response, but have no effect on a passive 
avoidance response. Since resumption of a 
bar-press response in a passive avoidance 
situation is similar to extinction of an active 
avoidance response, the latter findings may 
appear somewhat contradictory. An explana- 
tion for these results, however, may lie in 
the fact that bar-press responding could de- 
pend predominantly on visual rather than 
olfactory cues, while the reverse is true for 
one-way shuttle-box avoidance responding 
(Cairncross & King, 1969 *; King, 1969). In 
addition, there is evidence that low circulating 
adrenal steroid levels are correlated with in- 
creased olfactory sensitivity in the rat 

g, 1967). On a shuttle-box avoidance 
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to be important for performance, increased 
steroid levels may have the reverse effect on 
olfactory sensitivity and thus decrease per- 
formance. This hypothesis is currently being 
tested by King and Cairncross. The inverse 
relationship between olfactory sensitivity and 
plasma glucocorticoid levels has also been 
put forward (King, 1969) as a two-factor 
explanation of the Kamin effect. 

Since glucocorticoids have no independent 
effect on passive avoidance, but facilitate 
active shuttle-box avoidance extinction, it 
would appear that these hormones do mot 
enhance performance on all tasks involving 
aversive conditioning. 

Effects of high ACTH-low glucocorticoid y 
levels on learned fear-motivated responses. 
This combination of hormone levels may be 
achieved by a number of different methods, 
The first follows from the fact that the ACTH 
molecule consists of a sequence of 39 amino 
acids, and the presence of only the first 
sequence of 13 amino acids is necessary for 
the ACTH molecule to exert an influence on 
the adrenal cortex. It has been shown that 
the molecule containing either the 1-10 or 
4-10 sequence of amino acids inhibits extinc- 
tion of both a shuttle-box avoidance response, 
and a pole-jumping avoidance response in 
rats (Bohus & de Wied, 1966; de Wied & 
Pirie, 1968; Greven & de Wied, 1967). Injec- 
tion of the decapeptide sequence, which is 
present and similarly active in œ- and £- 
melanocyte stimulating hormone, did not af- 
fect escape acquisition, however (Bohus & de 
Wied, 1966). 

If hypophysectomy is performed, and sub- 
jects are not tested until a sufficient time 
has elapsed for adrenal cortex atrophy to 
occur, injections of ACTH will not lead to 
significant adrenal steroid secretion. Anderson 
et al. (1968) used this technique to demon- 
strate the independent performance enhancing 
effects of ACTH on a passive bar-press 
avoidance response for food. 

The final and most extensively used tech- 
removal of the adrenal 
). Adrenalectomy leads 
in circulating levels of 


nique is the total 
glands (adrenalectomy 
to a chronic increase j 
ACTH, greater ACTH secretion during stress, 


and eliminates the source of adrenal steroids 
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(Cox, Hodges, & Vernikos, 1958; Ganong, 
1967; Hodges & jones, 1964). 

Moyer (19582) demonstrated that adrenal- 
ectomy had no effect on acquisition or extinc- 
tion of an escape response in rats. Applezweig 
and Moeller (1959) and Bohus and Lissák 
(1968) found that adrenalectomized rats per- 
£ormed similarly to controls on acquisition of 
an active avoidance response. Miller and 
Ogawa (1962) combined adrenalectomy and 
injections of ACTH and obtained no effect 

in acquisition of an active avoidance response 
in comparison to adrenalectomized controls. 
The groups injected with ACTH during acqui- 
sition, however, made significantly more re- 
sponses in extinction. In addition, Bohus and 
Lissák (1968), de Wied (1967), and Weiss 
et al. (1969) demonstrated that adrenalec- 
tomy per se resulted in a marked inhibition 
of active avoidance extinction. This effect 
was reversed by hypophysectomy or injections 

of corticosterone. Finally, Weiss et al. (1969) 

found that adrenalectomized rats performed 

significantly better than controls on a task 
which required passive avoidance of an 
exploratory response. 

These studies suggest that ACTH inde- 
pendently serves to prolong learned fear- 
motivated responses. In addition, adrenalec- 
tomy in rats leads to significantly greater 
emotional elimination in the open field 
(Moyer, 1958b). It is therefore tempting to 
hypothesize at this point that ACTH en- 
hances “fear,” but if so there appears to be 
a contradiction in the data; if ACTH en- 
hances the fear drive (however defined), in- 
creased ACTH levels should lead to better 
performance in acquisition as well as extinc- 
tion. Experiments which have investigated the 
independent effects of ACTH on escape and 
active avoidance acquisition have all obtained 
nonsignificant results, however. On the other 
hand, it is generally accepted that escape 
acquisition depends more on the UCS as a 
forcing stimulus than on “fear. In addition, 
in all of the active avoidance studies which 
looked at this question, adrenalectomies were 

out. This technique may be a special 
case; adrenalectomy leads to the removal of 
the medulla as well as the cortex, and there 
is evidence that demedullation leads to re- 
ive avoidance performance (Conner 
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& Levine, 1969; Levine & Soliday, 1962). The 
implication therefore is that either increased 
ACTH levels or removal of the cortex com- 
pensates to produce a nonsignificant effect. 
Previously presented evidence suggests that 
adrenal steroids independently enhance active 
avoidance performance under certain condi- 
tions, however. Thus increased ACTH levels, 
and not removal of the cortex, would appear 
to be the compensatory mechanism. It is evi- 
dent from the behavioral data, and from what 
was previously pointed out concerning the 
temporal relationships between the hormones, 
that research into the independent effects of 
ACTH and glucocorticoids on various stages 
of active avoidance acquistion is needed be- 
fore more definite conclusions can be reached. 
In addition, studies which use adrenalec- 
tomized subjects in the experimental group, 
must use demedullated and not sham-operated 
subjects in the control group. 


Conclusions 


There is strong evidence that ACTH inde- 
pendently prolongs learned responses gen- 
erally assumed to be motivated by “fear.” 
Although the hormone may have some effect 
on memory, the most parsimonious explana- 
tion for this behavioral effect at the present 
time is that ACTH enhances the fear drive. 
More detailed research on a greater variety 
of tasks than have hitherto appeared is needed 
to clarify this point. On the other hand, 
glucocorticoids usually facilitate the extinc- 
tion of learned fear responses. Since they have 
no independent effect on passive bar-press 
avoidance, however, it cannot be postulated 
that their mode of action is always to reduce 
“fear,” possibly by inhibiting ACTH. Using 
conditioned suppression as an independent 
monitor of fear, Kamin, Brimer, and Black 
(1963) showed that extinction of an avoid- 
ance response may occur in the presence of 
high levels of fear. It is likely that either 
decreases in fear or increases in competing 
responses lead to extinction of avoidance re- 
sponses (Dua, 1969). The glucocorticoids may 
consequently produce their effects in extinc- 
tion by increasing such competing responses- 
One way in which they may do this is by 
decreasing olfactory sensitivity and thus ol- 
factory cues, In addition, recent evidence 
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Fic. 1. Excitatory and inhibitory pathways of a physiological system 
influencing fear-motivated responses. 


suggests that the glucocorticoids may facili- 
tate extinction by affecting certain basal 
brain areas (Bohus, 1968; Van Wimersma 
Greidanus & De Wied, 1969). Tt is possible 
that these two factors may be related. 
Finally, it is interesting to note that in- 
creased glucocorticoid levels, or increased glu- 
cocorticoid and ACTH levels, are positively 
correlated with better active avoidance per- 
formance when the avoidance response has 
been at least partially acquired. The cate- 
cholamines, on the other hand, appear to en- 
hance such performance at least until the 
relevant coping response has been at least 
partially acquired. 
Figure 1 therefore describes a physiological 
system which to an extent may influence 
various stages of acquisition and extinction of 
learned fear-motivated responses. The cate- 
cholamines appear to be associated with initial 
energy, arousal, and perhaps “fear,” while 
ACTH may provide further motivation and 
the glucocorticoids long-term energy and the 
“switch-off mechanism." If both adrenaline 
and ACTH enhance the fear drive, they prob- 
ably do so by different mechanisms. This is 
due to the fact that ACTH does not appear 
to supply significant peripheral “fear cues," 


which adrenaline does. 


Significance of the End Loop 
ce, if any, of the 


The behavioral significan 
tirely unexplored. 


end loop (Figure 1) is en 


There are clues, however, as to its physio- 
logical significance. The end loop involving 
the adrenal steroids, adrenaline, and ACTH 
requires at least 1 hour of stress to achieve 
maximal significance in the rat. This is due 
to the fact that maximal steroid release does 
not occur until approximately this amount 
of time has elapsed (Hodges & Jones, 1963). 
Within the loop, steroids regulate the produc- 
tion of adrenaline from noradrenaline in the 
adrenal medulla via the enzyme, phenyl- 
ethanolamine-N-methyltransferase (Wurtman 
& Axelrod, 1965). In turn, adrenaline may 
lead to ACTH secretion which leads to fur- 
ther steroid release (Bowman et al., 1968; 
Kitay, Holub, & Jailer, 1959). Once the 
stressor has initiated this sequence, the loop 
may be terminated by (a) removal of the 
stressor, and (5) a significant fall in steroid 
release which acts to inhibit further ACTH, 
thus “switching off" the system. There are 
two important facts concerning this switch- 
off mechanism. First, while the plasma half- 
life of circulating ACTH is only about 10 
minutes, that of the glucocorticoids is from 
30 to 90 minutes (Ganong, 1967; Levine & 
Brush, 1967). Second, glucocorticoid secretion 
reaches a “ceiling” level which cannot be 
raised by increasing the intensity of stress 
or by increasing circulating ACTH by injec- 
tion. Any of these factors, however, prolongs 
maximal steroid secretion (Ganong, 1967; 
Jones & Stockham, 1966). Thus, greater 
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initial stress subsequently leads to maximum 
steroid (mainly glucocorticoid) secretion for 
a longer time after stress termination. One 
implication of this is that the end loop may 
remain physiologically significant for some 
time after stress. In addition, the degree of 
such significance is dependent on the degree 
of initial stress. It is possible therefore that 
this set of events may be responsible for at 
least one of the poststress stress effects (loss 
of weight, changes in body temperature, in- 
creased stomach acid) observed by several 
experimenters (Weiss, 1968). ; 

The change in stomach acid is a particu- 
larly interesting phenomenon; there is evi- 
dence that glucocorticoids and ACTH in- 
crease gastric acid and pepsin secretion in 
some species (Ganong, 1967). In addition, 
there is evidence that they alter the resistance 
of the mucosa to the irritant actions of 
gastric secretions (Ganong, 1967). Therefore, 
an additional poststress effect, mediated by 
the end loop, may be stomach ulceration, 
There are further indirect behavioral data 
which tend to confirm this hypothesis. Thus 
higher basal steroid levels and better avoid- 
ance performance in rats have been associated 
(Wertheim et al., 1969), and better avoid- 
ance performance and greater susceptibility to 
ulceration have been. related (Sines, Cleeland, 
& Adkins, 1963). 

Adrenaline also appears to be implicated in 
ulceration. Phillips and Boone (1968) have 
demonstrated that the degree of stress- 
induced ulcertation in rats was positively 
correlated with adrenaline levels. Increased 
adrenaline levels without stress, however, did 
not increase ulceration unless the adrenaline 
was injected in quantities exeeding 300 pg. 

It is interesting to note that most studies 
which have examined stress-induced ulcera- 
tion have used immobility or immobility plus 
shock as the stressors (Weiss, 1968). It was 
indicated in the previous sections that adren- 
aline, ACTH, and the glucocorticoids appear 
to serve a physiologically and behaviorally 
adaptive function in those situations which 
call for active coping in the presence of a 
noxious UCS (a stressor). It is not surprising 
therefore that experimental procedures which 
lead to the secretion of the various hormones, 
but which prevent active coping, should maxi- 
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mize the incidence of ulceration. These situ- 
ations maximize hormonal secretions, prevent 
the offset of the hormonal System (Figure 1), 
and appear to antagonize the hormones 
appropriate adaptive functions. 
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